Tuesday, 2021-03-02

*** hamalq has quit IRC00:41
*** sboyron has joined #opendev-meeting06:54
*** hashar has joined #opendev-meeting08:56
*** hashar has quit IRC18:21
*** hamalq has joined #opendev-meeting18:38
clarkbanyone else here for the meeting? we will get started soon19:00
zbro/19:00
ianwo/19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Mar  2 19:01:07 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-March/000191.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
clarkbclarkb out March 23rd, could use a volunteer meeting chair or plan to skip19:01
clarkbThis didn't make it onto the email I sent, but will be trying to spend time with the kids during their break from school19:01
clarkbif you'd like to chair the meeting on the 23rd feel free to let us know and send out a meeting agenda prior to the meeting. Otherwise I think we can likely skip it19:02
clarkb#topic Actions from last meeting19:02
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:02
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.txt minutes from last meeting19:02
clarkbcorvus: are there changes to review to unfork jitsi meet's web component? (I think things continue to be busy with zuul so understood if not)19:03
clarkbI'll go ahead and readd the action and we can follow up on it next week19:04
clarkb#action corvus unfork jitsi meet19:04
clarkb#topic Priority Efforts19:04
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:04
clarkb#topic OpenDev19:04
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:04
clarkbLast week another user showed up requesting account surgery which has bumped the priority on addressing gerrit account inconsistencies back up again19:05
clarkbI've been trying to work through that since then19:05
clarkbAs suggested by fungi I have taken another approach at it which is to try and classify the conflicts based on whether or not one side of the conflict belongs to an inactive account or if the accounts appear to have been unused for significant periods of time19:06
clarkbThat has produced a list of ~35 account that we can go ahead and retire (which I did this morning) and then delete the conflicting external ids from the retired side19:06
clarkbI haven't done the external id deletions for all of those accounts yet, but did push up the script I am planning to use for that if people can take a look and see if that seems safe enough19:07
clarkb#link https://review.opendev.org/c/opendev/system-config/+/777846 Collecting scripting efforts here19:07
clarkbHoping to get through that chunk of fixes today, then rerun the consistency check for an up to date list of issues which can be fed back into the audit to get up to date classifications on accounts recent usage19:08
clarkbThere are a good number of accounst that do appear to have not been used recently. For those I think we can go through the same process as above (either pick an account out of the conflicting set to "win" or retire and remove external ids fro all of them)19:08
clarkbI did notice that there may be some accounts that are only used to query the server though and my organizing based on code reviews and pushes is probably incomplete19:09
clarkbI reached out to weshay|ruck about one of these (a tripleo account) to see if we can better capture those use cases19:09
clarkbit continues to feel like slow going, but it is progress and the more I look at things the better I understand them19:10
clarkbOne thing that occured to me is that setting accounts inactive is a relatively low cost option. That makes me think we should do this in a staged process where we set the accounts inactive then wait a week or whatever for people to complain (can send eamil about this too)19:10
clarkbthen if people complain we reactivate their accounts and move them out of the list, for the rest we remove the external ids and fix the conflicts19:11
clarkbanyway that is still a ways away as I want to refine the classifications further once this set is done19:11
clarkbAny other OpenDev topics to discuss before we move on?19:11
ianwno, but thanks for working on this tricky set of circumstances! :)19:12
fungii'm working on pushing git-review 2.0.0.0 release candidates now to exercise release automation for it in preparation for a new release19:13
fungiwe've got everything merged at this point which was slated for release19:13
clarkbcool, the big change being git-review will require python3?19:13
fungirc1 is in the release pipeline as we speak19:13
fungiyes, no more 2.7 support (thanks zbr for the change for that)19:14
clarkb#topic General topics19:16
*** openstack changes topic to "General topics (Meeting topic: infra)"19:16
clarkb#topic OpenAFS cluster status19:16
*** openstack changes topic to "OpenAFS cluster status (Meeting topic: infra)"19:16
clarkbianw is adding a third afs db server in order for us to have proper quorum in the cluster19:16
clarkbapparently 2 is not enough (not surprising)19:16
clarkbianw: anything additional to add to that? changes to review maybe?19:16
ianwyeah, that third server is active and has validated that it works ok with focal, so i'll take on the in-place upgrades we've talked about19:17
clarkbexcellent19:17
clarkbAlso I noticed that afs01.dfw's vicepa is fairly full19:17
ianwcouple of small reviews are https://review.opendev.org/c/opendev/system-config/+/778127 and https://review.opendev.org/c/opendev/system-config/+/77812019:17
clarkbI noticed that a few weeks ago and pushed up some changes to work towards dropping fedora-old (not sure of the exact version)19:18
clarkbThere are probably other ways we could prune the data set if others have ideas that would be great19:18
ianwahh, ok, i can go through and look for that and deal.  fedora is hitting up against our -minimal issues with tools on build hosts, the container-build stuff is working but needs polishing19:19
clarkbianw: ya we have fedora-old, fedora-intermediate, and fedora-current. Its -current that has trouble, most testing seems to be on -intermediate so I think we can drop -old19:20
clarkbbut if you can double check that and review some of the changes that would probably be good19:20
ianwwill do19:20
clarkb#topic Borg Backups19:21
*** openstack changes topic to "Borg Backups (Meeting topic: infra)"19:21
clarkbianw: fungi: any new insight into why gitea db backups pushing to the vexxhost dest has trouble?19:21
ianwno, but i have to admit i haven't looked fully.  i think i'll try and run the mysqldump a few times and see if that is dying locally19:22
clarkb++ that seems like a good test19:22
fungiahh, yeah i got sidetracked after getting as far as finding the disconnect error in the mariadb logs19:22
ianwthe fact that it died three days in a row at the same row number seems very supicious19:22
clarkbanything else on this topic?19:23
ianwand that the filesystem part doesn't seem to have issues; and no other host is reporting issues19:23
ianwnope, otherwise, i've retired the old servers, we have a 1tb drive attached to the RAX host with the latest rotation of bup backups if we require19:24
clarkbthank you!19:24
clarkb#topic Server Updates19:24
*** openstack changes topic to "Server Updates (Meeting topic: infra)"19:24
clarkbI've made some progress with zuul server rolling replacements19:24
clarkball the mergers are focal now and the old servers have been cleaned up (though it just occurred to me I still have dns records to clean up)19:25
clarkb#link https://review.opendev.org/c/opendev/system-config/+/778227 is the next step for executor replacements19:25
clarkbbasically if you think the new ze server is happy (from what I can see it is, including tarball publishing jobs to afs)19:25
clarkber if ^ then please help alnd that chagne. I'll delete the old server then start doing some replacements in larger batches (3 at a time?)19:25
clarkbAnyone else looking at updates other than afs servers, refstack, and zuul?19:26
ianwyeah i started on review19:26
clarkboh ya I saw your email to upstream about the mariadb weirdness19:27
ianwhowever we've got ourselves in a bit of a tangle with review01.<openstack|opendev>.org19:27
ianwso we have A dns records for review01.opendev.org19:27
ianwi proposed removing them for the new server ... https://review.opendev.org/c/opendev/zone-opendev.org/+/77792619:27
ianwi need to spend some time with system-config and see what we can do19:28
ianwcalling the new server "review02.opendev.org" *may* help a little?19:28
clarkbmy poor memory says we may have done that for a reason19:29
clarkbhrm ya and with the LE records too19:29
clarkbya we use the dns records there to validate the ssl cert on the server :/19:30
fungigit history might point to why we added it19:30
fungibut that sounds likely19:30
ianwbut do we need a cert for review01.opendev.org?19:30
ianwi don't feel like anyone is accessing it like that19:30
clarkbI think the major reason for it may be for sshfp since we sshfp to review01 for 22 but to review.opendev.org for 2941819:31
fungi#link https://review.opendev.org/744557 Split review's resource records from review01's19:31
clarkband ya maybe we can stop doing a review01 altname and just generate certs for review.opendev and review.openstack19:31
fungisshfp record was breaking ssh access to gerrit's ssh api port19:32
clarkband the sshfp records aren't super important right now iirc19:32
clarkbfungi: ya so we moved it to review01 from review19:32
clarkbso ya I think we are ok if we reduce the LE tie in and maybe clean up sshfp records too for completeness19:33
ianwok, i can look at that, split 777926 up into two steps19:33
fungimakes sense19:33
clarkbianw: then for bootstrapping the new host with ansibel we want to do somethign similar to what review-test did without replication config, etc19:33
clarkbanything else on the topic of server upgrades?19:34
ianwyep19:34
ianwone more thing, what did we decide about review-dev?19:34
clarkbianw: we should clean it up though that hasn't happened yet19:34
ianwok, i'll do that too19:34
clarkbmight want to double check with mordred and corvus et al that they dno't have anything on that server to retain (shouldn't but it was a sandbox for a while)19:35
clarkbwe also need to get review-test back into ansible but that is probably less urgent19:35
clarkb#topic New refstack server19:36
*** openstack changes topic to "New refstack server (Meeting topic: infra)"19:36
clarkbLooked like there was some new testing being done to sort out some problems? I didn't catch what the current problems are though19:36
kopecmartini have 2 patches up for that19:37
kopecmartin#link https://review.opendev.org/c/opendev/system-config/+/77629219:37
kopecmartinwhen merged, will the held server be updated automatically?19:37
kopecmartinI'd like to test it one more time and then let's got to production finally19:37
ianwkopecmartin: nope, that's not being ansiblised19:37
kopecmartinok, np, I'll do it manually then19:38
ianwhowever, we could update things on the held bridge and run it manually to confirm without having to run new nodes19:38
clarkbianw: kopecmartin  for that first change I think that may be a noop19:38
kopecmartinianw: or that, whatever you say :)19:38
clarkbbecause we are already redirecting everything under / to localhost:800019:38
clarkbI want to say there is a way to define the refstack api path in refstack itself19:38
clarkbapi_url =<%= scope.lookupvar("::refstack::params::api_url") %> is what puppet does19:39
kopecmartinclarkb: hmm, so maybe that's why refstack server didn't behave as expected when i tested it , because of the '/ to localhost:8000'19:39
clarkbI think you may want to set the config such that the api_url has an /api at the end of it19:39
kopecmartinyeah yeah, i was playing with the api_url opiton, but it was ignored and i couldn't figure out why19:39
kopecmartinnow it makes sense19:40
clarkbI think you may also have to set a js config value too19:40
clarkbI remember looking at it and leaving some comments recently19:40
kopecmartinok then, let me get back to it and i'll implement updates shortly and ping you back so that it's moving forward19:40
clarkbkopecmartin: ianw  in the ansible template for refstack config try changing api_url = {{ refstack_url }} to api_url = {{ refstack_url }}/api maybe?19:41
clarkbbut ya I'm not sure that apache config change will help since it is already sending things to /19:41
ianwi thought we did that, but maybe not19:41
kopecmartinwe did , but it didn't work19:41
clarkbI see19:41
kopecmartinit seemed like the opt was ignored or something like that19:41
clarkboh interesting the puppet side runs it at a wsgi app WSGIScriptAlias /api /etc/refstack/app.wsgi19:42
kopecmartintherefore I reverted that and put the proxy pass there (as workaround)19:42
clarkbso ya maybe the real fix is to switch to using it as wsgi?19:42
clarkbthat gets awkward with containes though19:42
clarkbanyway sounds like you're ahead of me in the debugging so I should get out of the way :)19:43
clarkbAnything else on this ?19:43
kopecmartinso the WSGIScriptAlias /api /etc/refstack/app.wsgi is an equivalent for the ProxyPass I wrote?19:43
clarkbkopecmartin: no, it runs a python wsgi process under apache and does wsgi "proxying" instead19:44
clarkbthey are similar in some ways but also different19:44
kopecmartinah19:45
ianwyeah it seems to be almost running the api bits separately19:45
clarkbalright lets move on19:47
clarkb#topic Bridge disk use19:47
*** openstack changes topic to "Bridge disk use (Meeting topic: infra)"19:47
clarkbfrickler discovered that /root/.cache is consuming a fair bit of disk. Particularly caches for python entrypoints and pip19:47
clarkbdoes anyone know what caches entrypoints (is it pkg_resources?) and if it is safe to simply remove the entire dir?19:48
clarkbI think my concern is that if a python process is running it may rely on that fiel being present after it has pkg_resourced19:48
ianwi think we could just mtime delete anything older than a day though?19:49
clarkbianw: ya we could do that too, but there are so many files I expect the stating for that to be slow. But maybe that is fine19:49
clarkbjust start it and then wait :)19:49
ianwyeah, i was thinking a cron job19:49
ianwpresumably it's not "leaking" as such, as it's under .cache ...19:50
clarkbwhat is weird is I can't find evidence that this is part of python packaging proper19:51
ianw".cache/python-entrypoints" does not give many hits19:52
clarkbI do have a much smaller number of entries on my local system from zuul testing it looks like19:52
fungicould it be stevedore?19:52
mordredianw: I do not have anything on review-dev19:53
clarkbfungi: ya maybe something in stevedore or ansible pulling in etc19:53
fungior something similar caching entrypoints, anyway19:53
clarkbI think it would be worthwhile to try and source it before we go and delete them so that we understand it better (and its expected rate of growth)19:53
clarkbI can probably take a look at that after getting this batch of gerrit accounts sorted19:53
mordredI think it's stevedore19:54
mordredrandom other hit on the internet: https://github.com/cpoppema/docker-flexget/issues/82 - also mentions stevedore - and I think I remember someone saying something about doing that a while back for performance19:54
mordredstevedore/_cache.py:        return os.path.join(base_path, 'python-entrypoints')19:55
clarkbthat looks incredibly suspicious :)19:55
* mordred puts on his useful-for-the-day hat19:55
fungii was hoping for something incredibly delicious. i shouldn't have skipped lunch19:55
clarkbbased on that it should be fine to do a time based clearing, but maybe we should also file a bug19:55
ianwand the latest patch is where you can drop a . file to stop it caching19:55
clarkbianw: oh ha someone else already hit this then I bet :)19:56
ianwAdd possibility to skip caching endpoints to the filesystem when '.disable' file is present in the cache directory.19:56
clarkb(the idea of a cache seems like a good one, I wonder why it needs so many cache files though)19:56
ianwis that coming from cloud launcher?  what exactly is using stevedore?19:56
clarkbwe have just a few minutes left so one more thing19:56
clarkb#topic InMotion OpenStack as a Service19:57
*** openstack changes topic to "InMotion OpenStack as a Service (Meeting topic: infra)"19:57
clarkbThis has ended up towards the bottom of my priority list due to otherdistractions. I think getting ssl sorted out on this system would still be worthwhile if anyone else wants to take a look (you basically need to figure out how to configure kolla then rerun kolla against the cluster)19:57
clarkbI think you can even tell kolla to just make a self signed cert as a first step19:58
clarkbanyway I think we are all busy so don't necessarily expect anyone to jump on that, but thought I would mention it so it doesn't get completely forgotten19:58
clarkb#topic Open Discussion19:58
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:58
clarkbAny thing else in our minute and a half remaining?19:58
fungithe git-review 2.0.0.0rc1 tag seems to have worked fine19:58
clarkbexciting19:59
fungi#link https://pypi.org/project/git-review/2.0.0.0rc1/19:59
clarkbdid you want people to install it and use it for a bit or was that mostly to exercise the publsihing?19:59
fungii just noticed though that the release notes could be better organized19:59
fungimostly to exercise publishing though we can ask folks to test it briefly19:59
fungi#link https://review.opendev.org/778257 will clean up release notes19:59
clarkbThat is all we haev scheduled tiem for. Thank you everyone and feel free to continue discussion in #opendev20:01
clarkb#endmeeting20:01
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:01
openstackMeeting ended Tue Mar  2 20:01:04 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:01
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-02-19.01.html20:01
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-02-19.01.txt20:01
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-02-19.01.log.html20:01
fungithanks clarkb!20:02
*** hashar has joined #opendev-meeting20:30
*** sboyron has quit IRC21:30
*** hashar has quit IRC23:10

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!