19:01:18 <clarkb> #startmeeting infra 19:01:20 <openstack> Meeting started Tue Mar 30 19:01:18 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:23 <openstack> The meeting name has been set to 'infra' 19:01:27 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-March/000199.html Our Agenda 19:01:57 <diablo_rojo> o/ 19:01:57 <clarkb> I wasn't around last week, but will do my best :) feel free to jump in help keep things going in the right direction 19:02:02 <ianw> o/ 19:02:59 <clarkb> #topic Announcements 19:03:33 <clarkb> I didn't have any. Do others? 19:03:52 <fungi> i don't think so 19:03:59 <fungi> gitea was upgraded 19:04:07 <fungi> keep an eye out for oddities? 19:04:27 <clarkb> ++ 19:04:27 <fungi> zuul was recently updated to move internal scheduler state into zookeeper 19:04:37 <fungi> keep an eye on that too 19:05:11 <clarkb> #topic Actions from last meeting 19:05:19 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-23-19.01.txt minutes from last meeting 19:05:32 <clarkb> ianw had an action to start asterisk retirement. I saw an email to service-discuss about it. 19:05:53 <ianw> no response on that, so i guess i'll propose the changes soon 19:06:06 <clarkb> ianw do you want to keep the action around until the changes are up and or landed? seems to be moving along at least 19:06:20 <ianw> sure, make sure i don't forget :) 19:06:36 <clarkb> #action ianw Propose changes for asterisk retirement 19:06:47 <clarkb> #topic Priority Efforts 19:06:54 <clarkb> #topic OpenDev 19:07:03 <clarkb> as mentioned we upgraded gitea from 1.13.1 to 1.13.6 19:07:31 <clarkb> keep an eye out for weirdness. 19:07:48 <clarkb> Do we also want to reenable project description updates and see if 1.13.6 handles that better? or maybe get the token usage change in first? 19:08:34 <ianw> tokens seems to maybe isolate us from any future hashing changes, but either way i think we can 19:09:04 <clarkb> ianw: maybe I should push up the description update change again and then compare dstat results with and without the token use. 19:09:20 <clarkb> that should give us a good indication for whether or not 1.13.6 has improved hashing enough or not? 19:09:25 <fungi> maybe 19:09:54 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/782887 19:09:54 <fungi> it was never completely smoking gun that project management changes triggered the cpu load 19:09:54 <ianw> for anyone reading without context :) 19:10:09 <fungi> they would sometimes overload *a* gitea backend and the rest would be perfectly happy 19:10:25 <clarkb> ya I suspect it has to do with background load as well 19:10:32 <fungi> so if we want to experiment in that direction, we'll need to leave it in that state for a while and it's not a surety 19:10:36 <clarkb> due to the way we load balance we don't necessary get a very balanced load 19:11:50 <clarkb> I also made some new progress on the gerrit account classification process before taking time off 19:12:10 <clarkb> if you can review groups in review:~clarkb/gerrit_user_cleanups/notes.20210315 and determine if they can be safely cleaned up like previous groups that would be great 19:12:23 <clarkb> I'll pick that up again as others have had a chance to cross check my work 19:12:29 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/780663 more user auditing improvements 19:12:41 <clarkb> that is a related scripting improvement. Looks like I have one +2 so I may just approve it today 19:13:01 <clarkb> essentially I had the scripts collect a bunch of data into yaml then I could run "queries" against it to see different angles 19:13:12 <clarkb> the different angesl are written down in the file above and can be corss checked 19:14:31 <clarkb> #topic Update Configuration Management 19:14:42 <clarkb> Any new config mgmt updates we should be aware of/review? 19:16:08 <fungi> i don't think so 19:16:19 <clarkb> #topic General Topics 19:16:30 <clarkb> #topic Server Upgrades 19:16:59 <clarkb> I did end up completing the upgrades for zuul executors and mergers and nodepool launchers 19:17:09 <clarkb> That leaves us with the zookeeper cluster and the scheduler itself 19:17:25 <clarkb> I have started looking at the zk upgrade and writing notes on an etherpad 19:17:26 <clarkb> #link https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 19:18:02 <clarkb> that etherpad proposes two options we could take to do the upgrade. If ya'll can review it and make sure the plans are complete and/or express an opinion on which path you would like to take I can boot instances and keep pushing on that 19:20:02 <clarkb> #topic Deploy new refstack server 19:20:10 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/781593 19:20:26 <clarkb> this change merged yesterday. ianw should I go ahead and remove this item from the meeting agenda? 19:20:51 <ianw> yep, deployment job ran so i'm not aware of anything else to do there 19:21:01 <clarkb> cool I'll get that cleaned up 19:22:15 <clarkb> #topic PTG Planning 19:22:31 <clarkb> I did submit a survey and put us on the schedule last week 19:22:52 <clarkb> the event runs April 19-23 and I selected Thursday April 22 1400-1600UTC and 2200-0000UTC for us 19:23:16 <clarkb> the first time should hopefully work for those in EU timezones and the second for those in asia/pacific/australia 19:23:45 <clarkb> my thought on that was we could do office hours and try to help some of our new project-config reviewers get up to speed or help other projects with infra related items 19:24:31 <clarkb> if the times just don't work or you think we need more or less let me know. I indicated we may need to rearrange scheduling when I filled out the survey 19:24:51 <clarkb> #topic docs-old volume cleanup 19:25:14 <clarkb> not sure if this is still current but it was on the agenda so here it is :) 19:25:52 <ianw> oh it was from when i was clearing out space the other day 19:26:05 <ianw> do we still need docs-old? 19:26:39 <fungi> we do not 19:26:47 <clarkb> is docs-old where we stashed the really old openstack documentation so that it could be found if people have really old installations but otherwise wouldn't show up in google results? 19:27:12 <fungi> that was kept around for people to manually copy things from if we failed to rebuild them during the transition to zuul v3 19:27:39 <fungi> i think anything we weren't actively building but was relevant was manually copied to the docs volume 19:27:47 <ianw> clarkb: yeah, it leaking into google via https://static.opendev.org/docs-old/ which i guess has nothing to stop that was a concern 19:28:08 <ianw> ok, well it sounds like i can remove it then 19:28:10 <fungi> we should probably robots.txt to exclude spiders from the whole static vhost 19:28:38 <clarkb> would it make sense to see if Ajaeger has an opinion? 19:28:46 <clarkb> since Ajaeger was pretty involved in that at the time iirc 19:29:44 <ianw> fungi: yeah, i can propose that. everything visible there should have a "real" front-end i guess 19:31:20 <clarkb> I don't have enough of the historical context to make a decision. I'll defer to others, but suggest maybe double checking with ajaeger if we can 19:31:57 <ianw> ok, i can ask, don't want to bother him with too much old cruft these days :) 19:32:21 <clarkb> ya I don't think ajaeger needs to help with cleanup or backups or anything, just indicate if he thinks any of it is worth saving 19:32:51 <clarkb> #topic planet.openstack.org 19:33:05 <clarkb> Another one I don't have a ton of background on but I see a retire it option and I like the sound of that >_> 19:33:23 <clarkb> looks like the aggregator software is not being maintained anymore whih puts us in a weird spot doing server updates 19:33:26 <ianw> yeah, linux australia retired their planet which made me think of it 19:33:40 <fungi> i guess we should probably at least let the folks using it know somehow 19:33:45 <fungi> like make an announcement 19:33:58 <clarkb> ++ and probably send that one to openstack-discuss given the service utilization 19:34:00 <ianw> i did poke at aggregation software, i can't see any that look python3 and maintained 19:34:00 <fungi> i could get the foundation to include a link to the announcement in a newsletter 19:34:16 <clarkb> basically say the software is not maintained and we can't find alternaties. We will retire the service as a result. 19:34:23 <ianw> i thought we could replace it with a site on static that has an OPML of the existing blogs if we like 19:34:37 <ianw> these days, a RSS to twitter feed would probably be more relevant anyway 19:34:38 <fungi> or if the foundation sees benefit in it, they may have a different way they would want to do something similar anyway 19:34:55 <fungi> yeah 19:35:27 <fungi> microblogging sites have really become the modern blog aggregators anyway 19:35:57 <ianw> (i did actually look for an rss to twitter thing too, thinking that would be more relevant. nothing immediately jumped out, a buch of SaaS type things) 19:36:05 <clarkb> ya twitter, hacker news, reddit etc seem to be the modern tools 19:36:24 <clarkb> and authors just send out links from their accounts on those platforms 19:36:26 <ianw> vale RSS, RIP with google reader 19:37:10 <ianw> maybe give me an action item to remember and i can send that mail and start the process 19:38:31 <clarkb> #action ianw Announce planet.o.o retirement 19:38:42 <ianw> i am old enough to remember when jdub wrote and released the original "planet" and we all though that was super cool and created a bunch of planets 19:39:02 <clarkb> #topic Tarballs ORD replication 19:39:26 <ianw> ok, last one, again from clearing out things earlier in the week 19:40:04 <ianw> of the things we might want to keep if a datacentre burns down, i think tarballs is pretty much the only one not replicated? 19:40:10 <ianw> #link https://etherpad.opendev.org/p/gjzssFmxw48Nn3_SBVo6 19:40:13 <ianw> that's the list 19:41:09 <ianw> docs is already replicated 19:41:15 <clarkb> ++ I think the biggest consideration has been that the vos release to a remote site of large sets of data isnt' quick 19:41:23 <clarkb> I think tarballs is not as large as our mirrors but bigger than docs? 19:41:33 <clarkb> I also suspect that we can set it up and see how bad it is and go from there? 19:41:37 <fungi> yeah, in that ballpark 19:41:51 <fungi> also the churn is not bad as it's mostly append-only 19:42:05 <fungi> or at least that's the impression i have 19:42:17 <fungi> i guess we'll find out if that's really true 19:42:36 <ianw> yeah, i don't think it's day-to-day operation; just recovery situations 19:42:39 <ianw> which happen more than you'd hope 19:43:05 <ianw> but still, i'd hate to feel silly if something happened and we just didn't have a copy of it 19:44:07 <clarkb> ya I think this is the sort of thing where we can make the change, monitor it to see if it is unhappy and go from there 19:44:12 <ianw> ORD has plenty of space. we can always drop the RO there in a recovery situation i guess too, if we need 19:44:27 <ianw> alright, i'll set that up. lmn if you think anything else in that list is similar 19:44:29 <clarkb> I want to say the newer openafs version we upgraded to is better about higher latency links? 19:44:59 <ianw> apparently, but still there's only so fast data gets between the two when it's a full replication scenario 19:45:40 <clarkb> ianw: maybe do all the project.* volumes? 19:46:08 <clarkb> I think those host docs for various things like zuul and starlingx 19:46:25 <clarkb> mirror.* shouldn't matter and is likely to be the most impacted by latency 19:46:46 <ianw> yeah, probably a good idea. i can update the docs for volume creation because we've sometimes done it and sometimes not it seems 19:46:55 <clarkb> ++ 19:47:24 <fungi> sure, small volumes are probably good to mirror more widely if for no other reason than we can, and they're one less thing we might lose in a disaster 19:48:12 <ianw> yeah, it all seems theoretical, but then ... fires do happen! :) 19:49:29 <clarkb> indeed 19:49:37 <clarkb> #topic Open Discussion 19:49:47 <clarkb> That was all on the published agenda 19:49:59 <ianw> i have a couple of easy ones from things that popped up 19:50:02 <clarkb> worth noting we think we have identified a zuul memory leak which is causing zk disconnects 19:50:11 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/782868 19:50:15 <ianw> stops dstat output to syslog 19:50:24 <clarkb> fungi was going to restart the scheduler to reset the leak and keep us limping along. corvus mentioned being able to actually debug tomorrow 19:50:31 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/783120 19:50:40 <ianw> puts haproxy logs into our standard container locations 19:50:59 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/782898 19:51:06 <clarkb> ianw: the dstat thing is unexpected but change lgtm 19:51:09 <ianw> allows us to boot very large servers when they are donated to us :) 19:51:28 <clarkb> ha on that last one 19:52:10 <fungi> yeah, we're a few minutes out from being able to restart the scheduler without worrying about openstack release impact 19:52:26 <fungi> i'm just waiting for one build to finish updating the releases site 19:52:34 <ianw> is it helpful to restart with a debugger or anything for the leak? 19:53:10 <fungi> oh, clarkb, that oddity we were looking at with stale gerritlib used in a jeepyb job? it happened again when i rechecked 19:53:17 <ianw> clarkb: yeah, i was like "i'm sure i provided a reasonable size for boot from volume ... is growroot failing, etc. etc." :) 19:53:33 <clarkb> ianw: I want to say we already have a hook to run profiling on object counts 19:53:42 <clarkb> ianw: but that is agood question and we should confirm with corvus before we restart 19:53:49 <corvus> i have not previously used a debugger when debugging a zuul memory leak; only the repl and siguser 19:54:07 <corvus> i'm always open to new suggestions on debugging memleaks though :) 19:54:20 <clarkb> seems like the repl stuff and getting object counts has been really helpful in the past at least 19:56:38 <clarkb> corvus: when I've tried in the past its been "fun" to figure out adding debugging symbols and all that. I suspect that since we use a compiled python via docker that this may be even more fun? 19:56:49 <clarkb> we can't just install the debugger symbols package from debian 19:57:05 <clarkb> (sorting that out may be a fun exercise for someone with free time though as it may be useful generally) 19:57:25 <clarkb> sounds like this may be about it. I can end here and we can go have breakfast/lunch/dinner :) 19:57:29 <clarkb> thank you everyone! 19:57:31 <clarkb> #endmeeting