19:01:07 <clarkb> #startmeeting infra 19:01:08 <ianw> o/ 19:01:09 <openstack> Meeting started Tue Aug 4 19:01:07 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:12 <openstack> The meeting name has been set to 'infra' 19:01:16 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000068.html Our Agenda 19:01:27 <clarkb> #topic Announcements 19:01:38 <clarkb> Third and final Virtual OpenDev event next week: August 10-11 19:01:59 <fungi> well, probably not *final* just the last one planned for 2020 ;) 19:02:10 <clarkb> final of this round 19:02:15 <clarkb> The topic is Containers in Production 19:02:26 <clarkb> which may be interesting to this group as we do more and more of that 19:02:40 <fungi> great point 19:02:52 <clarkb> I also bring it up because they use the etherpad server for their discussions similar to a ptg or forum session. We'll want to try and be slushy with that service early next week 19:03:34 <clarkb> #topic Actions from last meeting 19:03:45 <diablo_rojo> o/ 19:03:45 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.txt minutes from last meeting 19:04:01 <clarkb> ianw was going to look into incorporating non openstack python packages into our wheel caches 19:04:13 <clarkb> I'm not sure that has happened yet as there have been a number of other distractions recently 19:04:39 <ianw> umm started but haven't finished yet. i got a bit distracted looking at trying to parallel the jobs so we could farm it out across two/three/n nodes 19:04:57 <fungi> seems like the only real challenge there is in designing how we want to consume the lists of packages 19:05:23 <fungi> oh, but sharding the build will be good for scaling it, excellent point 19:06:05 <fungi> also i think we're not quite as clear as we could be on how to build for a variety of python interpreter versions? 19:06:31 <clarkb> fungi: yes we've only ever done distro version + distro python + cpu arch 19:06:37 <clarkb> which as a starting point is likely fine 19:07:48 <ianw> yeah,that was another distraction looking at the various versions it builds 19:07:53 <fungi> building for non-default python on certain distros was not a thing for a while, don't recall if that got solved yet 19:08:53 <clarkb> I don't think so, but seems like that can be a followon without too much interference with pulling different lists of packages 19:08:54 <fungi> like bionic defaulting to python3.6 but people wanting to do builds for 3.7 19:09:17 <fungi> (which is packaged for bionic, just not the default) 19:09:51 <fungi> that one may specifically be less of an issue with focal available now, but will likely come up again 19:10:05 <ianw> yeah similar with 3.8 ... which is used on bionic for some 3.8 tox 19:10:51 <clarkb> #topic Specs approval 19:10:58 <clarkb> lets keep moving as we have a few things on the agenda 19:11:08 <clarkb> #link https://review.opendev.org/#/c/731838/ Authentication broker service 19:11:34 <clarkb> that got a new patchset last week 19:11:42 <clarkb> I need to rereview it and other input would be appreciated as well 19:11:44 <fungi> it did 19:11:54 <clarkb> fungi: anything else you'd like t ocall out about it? 19:11:57 <fungi> please anyone feel free to take a look 19:13:12 <fungi> latest revision records keycloak as the consensus choice, and makes a more detailed note about knikolla's suggestion regarding simplesamlphp 19:13:33 <clarkb> great, I think that gives us a more concrete set of choices to evaluate 19:13:47 <clarkb> (and they seemed to be the strong consensus in earlier discussions) 19:14:27 <clarkb> #topic Priority Efforts 19:14:37 <clarkb> #topic Update Config Management 19:14:58 <clarkb> fungi: you manually updated gerritbot configs today (or was that yesterday). Maybe we should prioritize getting that redeployed on eavesdrop 19:15:21 <clarkb> I believe we're building a container for it and now just need to deploy it with ansible and docker-compose? 19:15:34 <fungi> sure, for now i just did a wget of the file from gitea and copied it over the old one, then restarted the service 19:15:56 <fungi> but yeah, that sounds likely 19:16:13 <clarkb> ok, I may take a look at that later this week if I find time. As it seems like users are noticing more and more often 19:16:44 <clarkb> corvus' zuul and nodepool upgrade plan email remainds me of the other place we need to update our config management: nb03 19:16:58 <fungi> yep, we had some 50+ changed lines between all the project additions, retirements, renames, et cetera 19:17:13 <clarkb> I think we had assumed that we'd get containers build for arm64 and it would be switch like the othre 3 builders but maybe we should add zk tls support to the ansible in the shorter term? 19:17:29 <clarkb> corvus: ianw ^ you've been working on various aspects of that and probably hvae a better idea for whether or not that is a good choice 19:18:26 <ianw> i guess the containers is so close, we should probably just hack in support for generic wheels quickly and switch to that 19:18:34 <corvus> did we come up with a plan for nodepool arm? 19:18:38 <corvus> containers 19:18:57 <corvus> i want to say we did discuss this.. last week... but i forgot the agreemente 19:19:07 <clarkb> I think the last I remember was using an intermediate layer for nodepool 19:19:16 <clarkb> but I'm not sure if anyone is working on that yet 19:19:41 <clarkb> from the opendev side I want to make sure we don't forget that if nodepool and zuul start merging v4 changes that change expectations nb03 may be left behind 19:19:44 <ianw> my understanding was that first we'd look at building the wheels so that the existing builds were just faster 19:19:59 <corvus> ah yep that was it 19:20:11 <corvus> build wheels, and also start on a new layer in parallel 19:20:17 <ianw> and if we still couldn't get there, look into intermediate layers 19:20:31 <corvus> k 19:20:33 <fungi> also help upstreams of various libs build arm wheels 19:20:47 <fungi> (hence the subsequent discussion about pyca/cryptography) 19:20:48 <clarkb> got it, in that case it seems we're making progress there and if we keep that up we'll probably be fine 19:21:17 <ianw> yep :) upstream became/is somewhat of a distraction getting the generic wheels built :) 19:21:30 <corvus> so the question is: should we pin zuul to 3.x? 19:21:31 <fungi> but a good distraction in my opinion 19:21:46 <corvus> since we could be really close to breaking ourselves 19:21:52 <clarkb> corvus: or short term add zk tls support to our ansible for nb03 19:22:23 <corvus> is nb03 all ansible, or is there puppet? 19:22:31 <clarkb> I believe it is all ansible now 19:22:53 <corvus> then it's probably not too hard to add zk tls; we should probably do that 19:23:02 <clarkb> oh hrm it still runs run-puppet on that host 19:23:13 <corvus> then i don't think we should touch it with a ten-foot pole 19:23:15 <clarkb> I guess its ansible in that it runs puppet 19:23:40 <corvus> adding zk tls to the puppet is just a 6-month long rabbit hole 19:24:17 <clarkb> ok, in that case we should keep aware of when zuul will require tls and pin to a previous versio nif we don't have arm64 nb03 sorted out on containers yet 19:25:08 <ianw> i think we can definitely get it done quickly, like before next week 19:25:39 <clarkb> in that case we continue as is and push for arm64 imges then imo. Thanks 19:26:58 <clarkb> any other config management topics befor we move on? 19:27:34 <clarkb> #topic OpenDev 19:28:10 <clarkb> we disbaled gerrit's /p/ mirror serving in apache 19:28:18 <clarkb> haven't heard of any issues from that yet 19:28:27 <fungi> [and there was much rejoicing] 19:28:32 <clarkb> I figure I'll give it another week or so then disable replicating to the local mirror and clean it up on the server 19:28:43 <clarkb> (just in case we need a quick revert if something comes up) 19:28:59 <clarkb> the next set of tasks related to the branch management are in gerritlib and jeepyb 19:29:01 <clarkb> #link https://review.opendev.org/741277 Needed in Gerritlib first as well as a Gerritlib release with this change. 19:29:10 <clarkb> #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change. 19:29:30 <clarkb> if folks have a chance to review those it would be appreciated. I can approve and tag releases as well as monitor things as they land 19:30:15 <clarkb> The other Gerrit service related topic was status of review-test 19:30:27 <clarkb> does anyone know where it got to? I know when we did the project renames it error'd on that particular host 19:30:38 <clarkb> it being the project rename playbook since review-test is in our gerrit group 19:31:03 <clarkb> mordred: ^ if you are around you may have an update on that? 19:32:22 <clarkb> we can move on and return to this if mordred is able to update later 19:32:29 <clarkb> #topic General topics 19:32:39 <clarkb> #topic Bup and Borg Backups 19:32:59 <clarkb> ianw: I seem to recall you said that a bup recovery on hosts that had their indexes cleaned worked as expceted 19:33:24 <ianw> yes, i checked on that, noticed that zuul wasn't backing up to the "new" bup server and fixed that 19:33:35 <ianw> i haven't brought up the new borg backup server and started with that, though 19:33:38 <clarkb> separately the borg change seems to have the rviews you need to land it then to start enrolling hosts as a next step 19:33:40 <clarkb> #link https://review.opendev.org/741366 19:33:53 <ianw> yep, thanks, just been leaving it till i start the server 19:34:09 <clarkb> no worries. Just making sure we're all caught up on the progress there 19:34:23 <clarkb> tl;dr is bup is working and borg has no major hurdles 19:34:27 <clarkb> (which is excellent news) 19:34:42 <fungi> also, resistance is futile 19:34:43 <clarkb> #topic github 3rd party ci 19:35:04 <clarkb> I think ianw has learned things baout zuul and github and is making progress working with pyca? 19:35:12 <clarkb> #link https://review.opendev.org/#/q/topic:opendev-3pci 19:35:31 <ianw> yes the only other comment there as about running tests on direct merge to master 19:35:46 <fungi> so something like our "post" pipeline? 19:35:56 <ianw> ... which is a thing that is done apparently ... 19:36:16 <fungi> or more like the "promote" pipeline maybe? 19:36:19 <ianw> fungi: well, yeah, except there's a chance the tests don't work in it :) 19:36:49 <fungi> okay, so like closing the barn door after the cows are out ;) 19:37:45 <clarkb> ianw: pabelanger or tobiash may have config snippets for making that work against github 19:37:48 <ianw> we can listen for merge events, so it can be done. i was thinking of asking them to just start with pull-requests, and then once we have that stable we can make it listen for master merges if they want 19:38:18 <clarkb> ya starting with the most useful subset then expanding from there seems like a good idea 19:38:21 <ianw> yeah, it's hard to test, and i don't want it to go mad and make it look like zuul/me has no idea what's going on. mostly the latter ;) 19:38:23 <clarkb> less noise if thing sneed work to get reliable 19:38:27 <clarkb> ++ 19:39:34 <clarkb> ianw: from their side any feedback beyond the reporting and events that get jobs run? 19:40:11 <ianw> not so far, there was some discussion over the fact that it doesn't work with the python shipped on xenial 19:40:26 <fungi> "it" being their job workload? 19:40:26 <ianw> it being the pyca/cryptography tox testing 19:40:30 <fungi> got it 19:40:52 <ianw> that didn't seem to be something that bothered them; so xenial is running 2.7 tests but not 3.5 19:40:58 <fungi> right, so they're using travis with pyenv installed python or something like that? 19:41:32 <fungi> anything in particular they've found neat/been excited about so far? 19:41:35 <ianw> yes, well it wgets a python tarball from some travis address ... 19:41:51 <fungi> totally testing like production there ;) 19:42:43 <ianw> yeah ... i mean that's always the problem. it's great that it works on 3.5, but not the 3.5 that someone might actually have i guess 19:43:17 <ianw> but, then again, people probably run out of their own env's they've built too. at some point you have to decide what is in and out of the test matrix 19:43:58 <clarkb> ya eventually you do what is reasonable and that is well reasonable 19:44:01 <ianw> not much else to report, i'll give a gentle prod on the pull request and see what else comes back 19:44:27 <fungi> thanks for working on that! 19:45:28 <clarkb> #topic Open Discussion 19:45:43 <clarkb> A few things have popped up in the last day or so that didn't make it to the agenda that I thought I'd call out 19:46:04 <clarkb> the first is OpenEdge cloud is being turned back on and we need to build a new mirror there. There was an ipv6 routingissue yesterday thta has since been fixed 19:46:19 <clarkb> I can work on deploying the mirror after lunch today, and are we deploying those on focal or bionic? 19:46:24 <clarkb> (I think it may still be bionic for afs?) 19:47:08 <clarkb> ianw: also I think you have an update to launch-node that adds sshfp records. I guess I should use that as part of reviewin the change when booting the new mirror 19:47:10 <fungi> ianw had mentioned something about rebuilding the linaro mirror on focal to rule out fixed kernel bugs for the random shutoff we're experiencing 19:47:34 <ianw> heh, yeah i just had a look at that 19:47:45 <ianw> there's two servers there at the moment? did you start them? 19:47:52 <clarkb> no I haven't started anything yet 19:48:06 <clarkb> my plan was to start over to ruel out any bootstrapping problems with sad network 19:48:18 <fungi> oh, also probably worth highlighting, following discussion on the ml we removed the sshfp record for review.open{dev,stack}.org by splitting it to its own a/aaaa record instead of using a cname 19:48:20 <clarkb> but if we think that is unnecesary I'm happy to review changes to update dns and inventory instead 19:48:31 <clarkb> fungi: has that change merged? 19:48:51 <fungi> i think i saw it merge last night my time 19:49:11 <fungi> yeah, looks merged 19:49:11 <ianw> there we no servers yesterday, perhaps donnyd started them. the two sticking points were ipv6 and i also couldn't contact the volume endpoint 19:49:37 <clarkb> k, I'll check with donnyd and you and sort it out in an hour or two 19:49:39 <fungi> actually still need to update the openstack.org cname for it to point to review.opendev.org instead of review01, i'll do that now 19:50:07 <clarkb> other items of note: we removed the kata zuul tenant today 19:50:25 <clarkb> I kept an eye on it since it was the firs ttime we've removed a tenant as far as I can remember and it seemed to go smoothly 19:50:43 <clarkb> and pip 20.2 has broken version handling for packages with '.'s in their names 19:50:59 <clarkb> 20.2.1 has fixed that and I've triggered ubuntu focal, bionic, and xenial image builds in nodepool to pick that up 19:51:09 <clarkb> it was mostly openstack noticing as oslo packages have lots of '.'s in them 19:51:29 <clarkb> but if anyone else has noticed that problem with tox's pip version new images should correct it 19:52:48 <clarkb> Anything else? 19:53:03 <fungi> also saw that crop up with dogpile.cache, but yeah within openstack context 19:53:26 <fungi> it mostly manifested as constraints not getting applied for anything with a . in the name 19:53:48 <clarkb> right it would update the package to the latest version despite other bounds 19:53:49 <fungi> so projects not using constraints files in the deps field for tox envs probably wouldn't have noticed regardless 19:53:56 <clarkb> which for eg zuul is probably a non issue as it keeps up to date for most things 19:54:34 <clarkb> thanks everyone! we'll be here next week after the opendev event 19:54:38 <clarkb> #endmeeting