#opendev-meeting log

19:01:06 <clarkb> #startmeeting infra
19:01:07 <openstack> Meeting started Tue Nov  3 19:01:06 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:10 <frickler> o/
19:01:11 <openstack> The meeting name has been set to 'infra'
19:01:19 <clarkb> link http://lists.opendev.org/pipermail/service-discuss/2020-November/000123.html Our Agenda
19:01:22 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-November/000123.html Our Agenda
19:01:32 <clarkb> #topic Announcements
19:01:40 <clarkb> Wallaby cycle signing key has been activated https://review.opendev.org/760364
19:01:48 <clarkb> Please sign if you haven't yet https://docs.opendev.org/opendev/system-config/latest/signing.html
19:02:01 <clarkb> this is fungis semi annual reminder that we should verify and sign the contents of that key
19:02:11 <clarkb> fungi: ^ anything else to add on that topic?
19:02:53 <fungi> not really, it's in place now
19:03:17 <clarkb> The other announcement I had was that much of the world has or is going to soon end/start summer time
19:03:24 <fungi> eventually i'd like to look into some opendev-specific signing keys, but haven't had time to plan how we'll handle the e-mail address yet
19:03:42 <clarkb> double check your meetings against your local timezone as things may be offset by an hour from where they were the last ~6months
19:04:58 <clarkb> #topic Actions from last meeting
19:05:05 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.txt minutes from last meeting
19:05:26 <clarkb> I don't see any recorded actions, but it has been a while. Was there anything from previous meetings we should call out quickly?
19:06:17 <fungi> nothing comes to mind
19:06:42 <clarkb> #topic Priority Efforts
19:06:50 <clarkb> #topic Update Config Management
19:07:10 <clarkb> One thing to call out here is that docker's new rate limiting as gone into effect (or should've)
19:07:21 <clarkb> I've yet to see catastrophic results from that for our jobs (and zuuls)
19:07:25 <clarkb> but we should keep an eye on it.
19:07:55 <clarkb> If things do get really sad I've pushed up changes that will stop us funneling traffic through our caching proxies which will diversify the source addresses and should reduce the impact of the rate limiting
19:08:21 <clarkb> frickler also reached out to them about their open source project support and they will give us rate limit free images but we have to agree to a bunch of terms which we may not be super thrilled about
19:08:43 <clarkb> in particular one that worries me is that we can't use third party container tools? something like that
19:09:08 <clarkb> fungi: do you think we should reach out to jbryce about those terms and see what he thinks about them and go from there?
19:09:32 <clarkb> (I mean other opinions are good too but jbryce tends to have a good grasp on those times of use agreements)
19:09:36 <fungi> well, it was more like we can't imply that unofficial tools are supported for retrieving and running those images, it seemed like
19:10:00 <ianw> i.e. podman, etc is what that means?
19:10:06 <clarkb> right, but we actively use skopeo for our image jobs ...
19:10:12 <frickler> maybe we should reply to docker and ask what they really mean with all that
19:10:19 <fungi> also it's specifically about the images we publish, not about how many images we can retrieve which are published by others
19:10:25 <clarkb> ianw: that was how I read it and ya clarification on that point may be worthwhile too
19:12:12 <fungi> for those who weren't forwarded a copy, here's the specific requirement: "...the Publisher agrees to...Document that Docker Engine or Docker Desktop are required to run their whitelisted images"
19:13:03 <clarkb> the good news so far is that our volume seems to be low enough that we haven't hit imediate problems. And fungi and I can see if jbryce has any specific concerns about their agreement (we can have our concerns too) ?
19:13:16 <frickler> ya I wasn't sure whether the mail should be considered confidential, but I think I could paste it into an etherpad to let us agree on a reply?
19:13:21 <fungi> the other requirements were mainly about participating in press releases and marketing materials for docker inc
19:13:51 <fungi> which while maybe distasteful are probably not as hard to agree to do if we decide this is important
19:14:29 <clarkb> ya and may not even be the worst thing if we end up talking about how we build images and do the speculative builds and all that
19:14:48 <frickler> it might also be interesting to find out whether base images like python+ubuntu might already be under the free program
19:15:12 <clarkb> frickler: thats a good point too because if our base images aren't then we are only solving half the problem
19:15:16 <clarkb> I wonder if there is a way to check
19:15:20 <frickler> which might imply that we don't have a lot of issues anyway, yes
19:15:37 <fungi> try to retrieve it 101 times in an afternoon? ;)
19:16:12 <frickler> we could ask about that in our reply, too. do we have a (nearly) complete list of namespaces we use images from?
19:16:36 <clarkb> frickler: you can probably do a serach on codesearch for dockerfile and FROM lines to get a representative sample?
19:16:45 <frickler> also, do we have a list of "opendev" namespaces? I know about zuul only
19:16:54 <clarkb> we have opendevorg and zuul
19:16:56 <fungi> opendevorg
19:17:00 <clarkb> well zuul has zuul and opendev has opendevorg
19:17:33 <corvus> i think "we" "have" openstack too
19:17:37 <frickler> do we talk for both or would we let zuul do a different contact
19:18:00 <clarkb> frickler: for now it is probably best to stick to opendevorg and figure out what the rules are then we can look at expanding from there?
19:18:10 <corvus> clarkb: ++
19:18:19 <clarkb> zuul may not be comfortable with all the same rules we may be comfortable with (or vice versa). Starting small seems like a good thing
19:18:54 <fungi> kolla also publishes images to their own namespace i think, loci may as well?
19:19:04 <fungi> but yeah, i would start with one
19:19:47 <clarkb> alright anything else on this topic or shold we move on?
19:21:04 <fungi> possible we could publish our images to more than one registry and then consume from one which isn't dockerhub, though that may encounter similar rate limits
19:21:13 <frickler> https://etherpad.opendev.org/p/EAfLWowNY8N96APS1XXM
19:21:24 <clarkb> yes I seem to recall tripleo ruled out quay as a quic fix because they have rate limits too
19:21:40 <clarkb> I think figuring out how we can run a caching proxy of some sort would still be great  (possibly a version of zuul-registry)
19:21:42 <frickler> sadly that doesn't include the embedded links, I'll add those later, likely tomorrow
19:21:52 <clarkb> frickler: thanks
19:22:50 <clarkb> #topic OpenDev
19:23:11 <clarkb> The work to upgrade Gerrit continues. I announced to service-announce@lists.opendev.org that this is going to happen november 20-22
19:23:23 <clarkb> fungi and I will be driving that but others are more than welcome to help out too :)
19:23:47 <clarkb> on the prep and testing side of things we need to spin review-test back up on 2.13 with an up todate prod state and reupgrade it
19:23:49 <fungi> yep, the more the merrier
19:24:06 <clarkb> we're also investigating mnaser's idea for using a surrogate gerrit on a performant vexxhost flavor
19:24:17 <clarkb> but I think we'll test that from a 2.13 review-test clone
19:24:31 <clarkb> fungi: do you think that is something we can start in the next day or two
19:24:33 <clarkb> ?
19:24:58 <fungi> yeah, i was hoping to have time for it today, but i'm still catching my breath and catching up after the past few weeks
19:25:15 <clarkb> cool I'm hoping for time tomorrow at this point myself
19:25:48 <clarkb> ianw: any new news on the jeepyb side of things where the db access will go away?
19:27:10 <clarkb> https://review.opendev.org/758595 is an unrelated bug in jeepyb that I caught during upgrade testing if people have time for that one
19:27:41 <ianw> clarkb: sorry no didn't get to that yet, although we poked at some api bits
19:28:01 <clarkb> no worries, I think we're all getting back into the swing of things after an eventful couple of weeks
19:28:13 <ianw> it seems what we need is in later gerrits (ability to lookup id's and emails)
19:28:42 <ianw> but not in current gerrit, which makes it a bit annoying that i guess we can't pre-deploy things
19:28:44 <clarkb> oh right the api exposes that. I think the thing we ned to check next on that is what perms are required to do that and we can look at that once review-test is upgraded again
19:29:55 <clarkb> we can definitely use review-test to dig into that more hopefully soon
19:30:15 <clarkb> anything else on the gerrit upgrade? or other opendev related topics?
19:31:36 <ianw> we can probably discuss out of the meeting, but i did just see that we got an email from the person presumably crawling gerrit and causing a few slowdowns recently
19:31:41 <fungi> yeah, i figured we'll leave review-test up again after the upgrade test for developing things like that against more easily
19:32:01 <fungi> ianw: yeah, they replied to the ml too, and i've responded to them on-list
19:32:19 <fungi> i dropped you from the explicit cc since you read the ml
19:32:45 <ianw> oh, ok haven't got that far yet :)
19:33:23 <clarkb> #topic General Topics
19:33:53 <clarkb> Quick note that I intend on putting together a PTG followup email in the near future too. Just many things to catch up on and that has been lagging
19:35:21 <clarkb> #topic Meetpad Access issues from China
19:35:27 <clarkb> frickler: you added this one so feel to jump in
19:35:42 <clarkb> It is my understanding that it apperas either corporate networks or the great firewall are blocking access to meetpad
19:35:51 <clarkb> this caused neutron (and possibly others) to fallback to zoom
19:36:02 <frickler> yeah, so I just saw that one person had difficulty joining the neutron meetpad
19:36:28 <frickler> and I was thinking that it would be good if we could solve that issue
19:36:43 <fungi> any idea which it was? businesses/isps blocking web-rtc at their network borders, or the national firewall?
19:36:49 <frickler> but it would likely need cooperation with someone on the "inside"
19:36:56 <corvus> can we characterize that issue?  (yes what fungi said)
19:37:14 <frickler> he said that he could only listen to audio
19:37:20 <corvus> (was it even webrtc being blocked or...)
19:37:55 <corvus> is there more than one report?
19:37:58 <fungi> yes, i would say first we should see if we can find someone who is on a "normal" (not corporate) network in mainland china who is able to access meetpad successfully (if possible), and then try to figure out what's different for people who can't
19:38:08 <fungi> i should say not corporate and not vpn
19:38:42 <fungi> there are also people outside china who can't seem to get meetpad to work for various reasons, so i would hate to imply that it's a "china problem"
19:39:39 <clarkb> maybe we can see if horace has time to do a test call with us?
19:39:42 <clarkb> then work from there?
19:40:49 <frickler> ftr there were also people having issues with zoom
19:41:20 <clarkb> I'll try to reach out to horace later to day local time (horace's morning) and see if that is something we can test out
19:41:25 <frickler> and even some for whom meetpad seemed to work better than zoom, so not a general issue in one direction
19:41:59 <frickler> see the feedback etherpad https://etherpad.opendev.org/p/October2020-PTG-Feedback
19:42:09 <fungi> yes, meetpad works marginally better for me than zoom's webclient (i'm not brave enough nor foolhardy enough to try zoom's binary clients)
19:43:12 <clarkb> anything else on this subject? sounds like we need to gather more data
19:43:29 <frickler> another, likely unrelated issue, was that meetpad was dropping the etherpad window at times when someone with video enabled was talking
19:43:51 <fungi> i also had a number of meetpad sessions where the embedded etherpad stayed up the whole time, so i still am not quite sure what sometimes causes it to keep getting replaced by camera streams
19:44:39 <fungi> though yeah, maybe it's that those were sessions where nobody turned their cameras on
19:44:49 <fungi> i didn't consider that possibility
19:44:55 <clarkb> may be worth filing an upstream bug on that one
19:45:13 <corvus> we're also behind on the js client; upstream hasn't merged my pr
19:45:16 <clarkb> I did briefly look at the js when it was happening for me and I couldn't figure it out
19:45:29 <clarkb> corvus: ah, maybe we should rebsae and deploy a new image and see if it persists?
19:45:35 <corvus> maybe it's worth a rebase/update before the next event
19:45:39 <clarkb> ++
19:46:00 <fungi> sometime in the next few weeks might be good for that matter
19:46:20 <corvus> something happening in a few weeks?
19:46:36 <clarkb> fungi is gonna use meetpad for socially distant thanksgiving?
19:46:50 <fungi> nah, just figure that gives us lots of time to work out any new issues
19:47:24 <corvus> ah yep.  well before the next event would be best i agree
19:47:33 <clarkb> ok we've got ~13 minutes left and a couple topics I wanted to bring up. We can swing back around to this if we hvae time
19:47:39 <clarkb> #topic Bup and Borg Backups
19:47:53 <clarkb> ianw: I think you've made progress on this but wanted to check in on it to be sure
19:48:17 <ianw> there's https://review.opendev.org/#/c/760497/ to bring in the second borg backup server
19:48:30 <ianw> that should be ready to go, the server is up wit hstorage attached
19:49:08 <ianw> so basically i'd like to get ethercalc backup up to both borg servers, then stage in more servers, until the point all are borg-ing, then we can stop bup
19:49:08 <clarkb> any changes yet to add the fuse support deps?
19:49:19 <ianw> todo is the fuse bits
19:49:19 <clarkb> k
19:49:19 <ianw> that's all :)
19:49:27 <clarkb> thank you for pushing on that
19:49:38 <clarkb> #topic Long term plans for running openstackid.org
19:49:53 <clarkb> Among the recent fires was the openstackid melted down during the virtual summit
19:50:15 <clarkb> it turned out there were caching problems which caused bsaically all the requests to retry auth and that caused openstack id to break
19:50:51 <clarkb> we were asked to scale up openstackid's deployment which fungi and I did. What we discovered doing that is if we had to rebuild or redeploy the service we wouldn't be able to do so successfully without intervention from the foundation sysadmins due to firewalls
19:51:23 <clarkb> I'd like to work with them to sort out what the best options are for hosting the service and it is feeling like we may not be it. But I want to see if others have strong feelings
19:51:48 <clarkb> they did mention they have docker image stuff now so we could convert them to our ansible + docker compose stuff if we wanted to keep running it
19:54:14 <fungi> for background, we stood up the openstackid.org deployment initially because there was a desire from the oif (then osf) for us to switch to using it, and we said that for such a change to even be on the table we'd need it to be run within our infrastructure and processes. in the years since, it's become clear that if we do integrate it in some way it will be as an identity option for our users so not
19:54:16 <fungi> something we need to retain control over
19:55:17 <fungi> currently i think translate.openstack.org, refstack.openstack.org and survey.openstack.org are the only services we operate which rely on it for authentication
19:56:19 <fungi> of those, two can probably go away (translate is running abandonware, and survey is barely used), the other could perhaps also be handed off to the oif
19:56:24 <clarkb> ya no decisions made yet, just wanted to call that out as a thing that is going on
19:56:38 <clarkb> we are just about at time now so I'll open it up to any other items really quick
19:56:41 <clarkb> #topic Open Discussion
19:57:52 <fungi> we had a spate of ethercalc crashes over the weekend. i narrowed it down to a corrupt/broken spreadsheet
19:58:14 <fungi> i'll not link it here, but in short any client pulling up that spreadsheet will cause the service to crash
19:58:27 <corvus> can/should we delete it?
19:58:44 <corvus> (the ethercalc which must not be named)
19:58:46 <fungi> and the webclient helpfully keeps retrying to access it for as long as you have the tab/window open, so it re-crashes the service again as soon as you start it back up
19:59:04 <clarkb> if we are able to delete it that seems like a reasonable thing to do
19:59:12 <fungi> yeah, i looked into how to do deletes, there's a rest api and the document for it mentions a method to delete a "room"
19:59:15 <clarkb> its a redis data store so not sure what that looks like if there isn't an api for it
19:59:46 <fungi> i'm still not quite sure how you auth to the api, suspect it might work like etherpad's
20:00:11 <clarkb> and now we are at time
20:00:17 <clarkb> fungi: yup they are pretty similra that way iirc
20:00:21 <clarkb> thank you everyone!
20:00:23 <clarkb> #endmeeting