Tuesday, 2021-01-05

*** hamalq_ has quit IRC01:59
*** sboyron has joined #opendev-meeting07:53
*** hashar has joined #opendev-meeting07:57
*** sboyron_ has joined #opendev-meeting09:04
*** sboyron has quit IRC09:04
*** sboyron_ has quit IRC09:06
*** sboyron_ has joined #opendev-meeting09:07
*** sboyron_ has quit IRC12:07
*** sboyron_ has joined #opendev-meeting12:07
*** sboyron_ has quit IRC13:08
*** sboyron_ has joined #opendev-meeting13:09
*** hashar is now known as hasharAway15:23
*** yoctozepto has quit IRC15:43
*** yoctozepto has joined #opendev-meeting15:44
*** sboyron_ has quit IRC15:51
*** sboyron has joined #opendev-meeting16:02
*** sboyron has quit IRC16:17
*** sboyron has joined #opendev-meeting16:17
*** hasharAway is now known as hashar16:43
*** sboyron has quit IRC17:22
*** sboyron has joined #opendev-meeting17:24
*** hashar is now known as hasharDinner17:49
clarkb#startmeeting infra19:01
openstackMeeting started Tue Jan  5 19:01:20 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkbhello everyone, welcome to the first meeting of 202119:01
clarkbOthers indicated they would be delayed in joining so I'll give it a few minutes before we dive into the agenda I sent out19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-January/000160.html Our Agenda19:02
clarkb#topic Announcements19:05
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:05
clarkbI didn't have any announcements. Were there others to share?19:05
* corvus joins late19:05
fungii've nothing to share19:06
clarkb#topic Actions from last meeting19:06
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:06
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-08-19.01.txt minutes from last meeting19:06
clarkbIt hasbeen a while since our last meeting. I don't see any actions registered tehre. I think we can just roll forward into 202119:07
clarkb#topic Priority Efforts19:07
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:07
clarkb#topic Update Config Management19:07
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:07
clarkbOver the holidays it appears that rax was doing a number of host migrations. A non zero number of these failed leaving servers unreachable19:07
clarkbother than services like ethercalc, wiki, and elasticsearch going down as a result one of the fallouts to this is our ansible playbooks try to connect to the servers and never time out piling up a number of stale ansible-playbook processes and their children on bridge19:08
clarkbthen subsequent runs timeout because the server is slow due to load19:08
clarkbWe do set an ansible ssh connection timeout but it doesn't seem to be sufficient in these cases19:09
clarkbfungi: ^ I think you had a theory for why that may be but I can't remember it right now?19:09
fungibecause ssh doesn't time out connecting19:09
fungissh authenticates and hangs19:09
clarkbI see, its the next step that isn't being useful19:09
clarkbI wonder if we can make that better in ansible or if ansible already has tooling to try and detect that.19:10
fungibasically the servers are in a pathological condition which i think ansible's timeout mechanism doesn't take into consideration but happens rather regularly for us19:10
clarkblike maybe we can set a task timeout to some value like 2 hours19:10
clarkbanyway we don't need to solve it here. I just wanted to call that out since we hit this problem multiple times on bridge over the holidays ( and on our return)19:11
corvusunsure if this is on/off topic, but i made some changes to the root email alias, and it doesn't seem to have taken effect on many servers; is our periodic ansible run failing due to these issues?19:11
fungiit's either hanging the connection indefinitely during or immediately following authentication, i'm not sure which19:11
clarkbcorvus: base was failing, but should be running as of yeaterday evening my local time19:11
clarkbcorrection: base was timing out19:11
corvusok, so i'll see if my inbox is full again tomorrow :)19:12
fungiyeah, so servers later in the sequence would have been repeatedly skipped19:12
clarkband if you notice servers are unresponsive reboots seem to correct their issues19:13
clarkbany other config management items to bring up? that was all I had19:13
clarkb#topic OpenDev19:14
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:14
clarkbOn the Gerrit tuning topic we enabled the git v2 protocl then updated our zuul images to enable it client side and that was the last gerrit tuning we did19:14
clarkbit seems to be working from a functionality perspective (zuul and git review are happy etc) but probably too early to say if it has helped with the system load issues19:15
corvusyeah, we also scheduled holidays ;)19:15
corvusif the tuning doesn't work out, let's fall back on scheduling more holidays19:16
fungiyeah, i'll be more convinced next week or the week after when everyone's turning it up to 11 again19:16
clarkbOther tuning ideas are the strong refs for jgit caches (potentially needs more memory and is scary for that reason), setting up service user and regular user thread counts to better balance CI and humans, and on the upstream mailing list there has been a ton of recent discussion from other users about tuning caches19:16
clarkbcorvus: do you know where ianw has gotten with the zuul results plugin work? I think you were helping to get that into an upstream plugin?19:17
clarkbI expect we will be able to incorporate taht into our images soon, but I've not yet acught up on the status of this work19:18
fungii'll readily admit i ended up not finding time to work on the jeepyb fixes for update_bug/update_bp as other problems kept preempting my time19:18
corvusum... i haven't checked recently but last i remember is it exists in an upstream repo19:18
clarkbcorvus: cool so progress :)19:18
clarkbthe other thing ianw had brought up was using the built in WIP status for changes. In testing that we have found that Zuul doesn't understand WIP status changse as unmergable19:19
corvus#link https://gerrit.googlesource.com/plugins/zuul-results-summary/19:19
clarkbwe mentioned this last time we had a meeting but we should discourage users from using that until Zuul does understand that status19:19
corvusi can add that feature19:19
clarkbthe preexisting WIP vote on the workflow should be used until zuul has been updated19:20
clarkbcorvus: tahnks19:20
corvus#action corvus add wip support to zuul19:20
clarkbThe last Gerrit related topic I wanted to bring up was the 3.3 upgrade. guillaumec says that 3.3.1 incorporates the fix for zuul19:20
corvusthis was the comments thing (that would break 'recheck' i think)19:21
clarkbI think that means we can start looking at 3.3.1 upgrades if people have time. The upgrade does involve some changes like Non-Interactive Users group being renamed to Service Users and I am sure there are other things to consider so if we do that lets read release notes and test it (review-test can still be used for this I think)19:21
clarkbcorvus: yup19:21
corvusi haven't checked on what the final status of that is (ie, do we need to enable an option or is it transparantly backwards compat)19:21
clarkboh good point we should also dobule check this fix doesn't need settings to be effective19:22
corvusi think people were leaning towards not requiring that, but it was a suggestion, so we should verify19:22
clarkbI don't know that I'll have time to drive a gerrit upgrade at the beginning of the year. I've got all the typically beginning of the year things distracting me. But I can help anyone else who may have time (if they don't also have beginning of the year items)19:22
clarkbianw was also working on improving our testing of gerrit in CI19:23
clarkbit might be worth getting those improvements landed then relying on it to help verify the next upgrade. I don't think we're in a rush so that may be a good idea19:23
clarkbThe other opendev related upgrade is Gitea 1.1319:24
clarkb#link https://review.opendev.org/c/opendev/system-config/+/76922619:25
clarkbthis upgrade seems to be a bigger leap than previous gitea upgrades. They have added new features like project management kanban boards19:25
clarkbour testing is decent for api checking but maybe we should hold the run job for that change now and put a repo or three in it and confirm it is happy from a ui perspective?19:25
corvusoO19:25
clarkbthis version also adds elasticsearch support for indexing. It isn't the default and I think we should upgrade to it first without worrying about elasticsearch just to sort out the other changes. Then as a followon we can work to sort out elasticsearch19:26
fungiour manage-projects test loads repos into gitea, can we depends-on or something to just take advantage of that and hold it?19:26
clarkbfungi: the gitea test creats all of the projects, but without git content19:27
clarkbfungi: all you need to do is push the content in after holding it19:27
fungiahh19:27
clarkbwe could potentially modify the job to push in content for some small repos too)19:27
clarkbthat may be a good idea19:27
fungior push some ourselves after setting up necessary credentials, yeah19:27
clarkbya why don't we do that. I'll WIP the change and suggest we hold it and check the ui since the upgrade is a bit more involved than ones we have done perviously19:29
clarkbAny other opendev topics to discuss or should we move on?19:30
fungiannual report?19:30
clarkbthats next though I guess technically it fits under here19:30
fungior did you have a separate topic for that?19:30
fungiahh, no worries19:30
clarkbya I had it in general topics but it is the opendev project update. Lets talk about it here19:30
* fungi should read meeting agendas19:30
clarkbWe have been asked to put together a project update for opendev in the foundation's annual report19:31
clarkb#link https://etherpad.opendev.org/p/opendev-2020-annual-report19:31
clarkbI have written a draft. But I'm happy to scrap that if others want to write one. Also happy for edits and suggestions19:31
clarkbI believe we have a week from tomorrow to get it together so this isn't a huge rush but is also a near future item to figure out19:31
fungii'm also putting some polish on our engagement metrics generator: https://review.opendev.org/72929319:34
clarkbI've been planning to do periodic rereads and edits myself too. Basically want to reread it with it being a bit more fresh than correct things as necessary19:34
clarkb#topic General topics19:34
*** openstack changes topic to "General topics (Meeting topic: infra)"19:34
clarkb#topic Bup and Borg Backups19:34
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:34
clarkbI think we may be about ready to drop this entry from our agenda. I'll double check with ianw when holidays end.19:35
clarkbtldr aiui is we're using borg now, bup should be diasbled at least on some servers19:35
clarkbwe'll keep the old bup backups around on the old volumes liek we've done with previous bup rotations19:35
clarkbif you haven't yet had a chance to interact with borg and try out recovery methods that may be a good exercise. Should only take about half an hour I would expect19:36
clarkb#topic InMotion Hosted Cloud19:37
*** openstack changes topic to "InMotion Hosted Cloud (Meeting topic: infra)"19:37
clarkbThe other thing I've been working on this week is getting an account with inmotion bootstrapped so that we can spin up an openstack cloud there for nodepool resources when they are ready19:37
clarkbI have created an account and the details for that as well as our contacts are in the usual location. There is no actualy cloud yet though. AIUI we are waiting on them to tell us they are ready to try bootstrapping the actual resources19:38
fungithis is the experiment where we're sort of on the hook as openstack cloud admins, right?19:39
fungiinfracloud mk2?19:39
clarkbyes, but I think we've decided taht we are comfortable with a redeploy strategy using their provided management tools19:40
clarkbin theory that means the actual overhead to us is low19:40
fungiokay, so basically hands-off and if it breaks we push a button and rebuild it all19:40
clarkbexactly19:40
corvusso if it breaks or we need to upgrade, ^ that?19:40
clarkbyup19:40
corvusthat happens occasionally with our current providers too19:41
clarkbthey have also expressed interest in zuul and nodepool so maybe we can get them involved there too19:41
fungiopenstack as a service. it'll be interesting19:41
clarkb#topic Open Discussion19:42
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:42
clarkbThat was about all I had. There are some old agenda items that I should probably clean up after thinking about them for half a second19:43
clarkbI've got meetings mon-wed next week that will have me distracted in the mornings (and maybe afternoons? I don't know if that has bee nsorted out yet)19:43
clarkbI should be around for our meeting next week though19:43
fungiyeah, same here (same meetings)19:44
fungibut they're half-day if memory serves, so shouldn't be entirely distracting19:44
clarkbAnything else? or should we call it here?19:46
* fungi has nothing19:46
clarkbsounds like that may be it then. Thanks everyone and we'll see you here next week19:47
fungithanks clarkb!19:47
clarkbfeel free to bring up discussions in #opendev or on the mailing list and we can pick things up there if they were missed here19:47
corvusthanks!19:47
clarkb#endmeeting19:47
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:47
openstackMeeting ended Tue Jan  5 19:47:41 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:47
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-05-19.01.html19:47
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-05-19.01.txt19:47
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-05-19.01.log.html19:47
*** sboyron has quit IRC20:06
*** hasharDinner is now known as hashar20:12
*** hashar has quit IRC22:38

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!