19:01:11 <clarkb> #startmeeting infra 19:01:13 <openstack> Meeting started Tue Feb 16 19:01:11 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:16 <openstack> The meeting name has been set to 'infra' 19:01:21 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-February/000184.html Our Agenda 19:01:34 <clarkb> Sorry I had meant to send this out yesterday and got it put together but then got distracted by server upgrades 19:03:30 <clarkb> #topic Announcements 19:03:38 <clarkb> There were none listed 19:03:40 <clarkb> #topic Actions from last meeting 19:03:48 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-09-19.01.txt minutes from last meeting 19:03:57 <clarkb> we had two (though only one got properly recorded) 19:04:07 <clarkb> ianw was looking at wiki borg backups (I think this got done?) 19:04:49 <ianw> yes wiki is now manually configured to be backing up to borg 19:05:28 <clarkb> corvus had an action to unfork jitsi meet 19:05:38 <clarkb> the web componenet at least (everything else is already unforked) 19:06:09 <corvus> not done, feel free to re-action 19:06:22 <clarkb> #action corvus unfork jitsi meet web component 19:06:32 <clarkb> #topic Priority Efforts 19:06:37 <clarkb> #topic OpenDev 19:06:38 <fungi> i also saw ianw's request for me to double-check the setup on wiki.o.o, will try to get to that after the meeting wraps up 19:06:43 <clarkb> fungi:thanks 19:07:19 <clarkb> I did further investigation of gerrit inconsistent accounts and wrote up notes on review-test 19:07:31 <clarkb> I won't go through all the status of things because I don't think much has changed since the last meeting 19:07:49 <clarkb> but I could use another set or two of eyeballs to look over what I've written down to see if the choices described there make sense 19:08:00 <clarkb> if they do then the next step is likely to make that staging All-Users repo and start committing changes 19:08:14 <clarkb> we don't need to work through that in the meeting but if you have time to look at it and want me to walk you through it let me know 19:08:58 <clarkb> I was going to call out a couple of Gerrit 3.3 related changes but looks like both have merged at this point. Thank you reviewers 19:09:29 <clarkb> For the gitea OOM problems we've noticed recently I pushed up a haproxy rate limiting framework change 19:09:31 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/774023 Rate limiting framework change for haproxy. 19:09:57 <clarkb> I doubt that is mergeable as is, but if you have a chance to review it and provide thoughts like "never" or "we could probably get away with $RATE" that may be useful for future occurences 19:10:05 <fungi> i'm feeling like if the count of conflicting accounts is really that high, we should consider sorting by which have the most recent (review/owner) activity and prioritize those, then just disable any which are inactive and let people know, rather than manually investigating hundreds of accounts 19:10:05 <clarkb> that said, I am beginning to suspect that these problems may be self induced 19:10:43 <clarkb> fungi: yup, I'm beginning to think that may be the case. We could do a rough quick search for active accounts, manually check and fix those, then do retirement for all others 19:10:44 <fungi> er, inactive in the sense of not used recently 19:10:56 <fungi> not the inactive account flag specifically 19:11:09 <clarkb> I can look at the data from that perspective and write up a set of alternate notes 19:11:15 <fungi> maybe also any which are referenced in groups 19:11:31 <clarkb> judging based on the existing data I expected that may be be something like 50 accounts max that we have to sort out manually and the rest we can just retire 19:11:32 <fungi> but those are likely very few at this point 19:11:43 <clarkb> but would need to do that audit 19:11:55 <fungi> i can try to help with that 19:11:58 <clarkb> thanks 19:12:21 <clarkb> To help investigate further if the gitea ooms may be self inflicted by our aggressive project description updates I've been trying to get some server metrics into our system-config-run jobs 19:12:23 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/775051 Dstat stat gathering in our system-config-run jobs to measure relative performance impacts. 19:12:49 <clarkb> That failed in gitea previously, but I just pushed a rebase to help make gerrit load testing a thing 19:13:06 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/775883 Gerrit load testing attempt 19:13:26 <clarkb> there was a recent email to the gerrit mailing list about gatling-git which can be used to do artificial load testing against a gerrit and that inspired me to write ^ 19:13:33 <clarkb> I think that could be very useful for us if I can manage to make it work 19:13:43 <clarkb> in particular I'm interested in seeing differences between 3.2 and 3.3 19:13:54 <ianw> lgtm; unfortunately there didn't seem to be a good way to provide a visualization of that i could find 19:14:05 <ianw> (dstat) 19:14:24 <clarkb> ya I think we can approach this step by step and add bits we find are lacking or would be helpful 19:14:49 <clarkb> anyway this has all been in service of me trying to better profile our services as we've had a coupel of issues around that recently 19:15:00 <clarkb> I think the owrk is promising but still early and may have very rough edges :) 19:15:30 <clarkb> ianw and fungi also updated some links on opendev docs and front page to better point people at our incident list 19:15:43 <clarkb> Are there any other opendev related items to bring up before we move on? 19:17:37 <clarkb> #topic Update Config Management 19:18:14 <clarkb> ianw: the new refstack deployment is happy now? we are just waiting on testing before scheduling a migration? 19:18:51 <ianw> well i guess i have migrated it 19:19:01 <ianw> the data at least 19:19:18 <ianw> yes, not sure what else to do other than click around a bit? 19:19:25 <clarkb> right but refstack.openstack.org is still pointed at the old server (so we'll need to do testing, then schedule a downtime where we can update dns and remigrate the data) 19:19:42 <clarkb> I think kopecmartin had some ideas around testing, probably just point kopecmartin at it to start and see what that turns up 19:19:42 <ianw> has any new data come into it? 19:20:01 <clarkb> new data does occasionally show up, though I don't know if it has in this window 19:20:22 <ianw> you can access the site via https://refstack01.openstack.org/#/ 19:21:02 <clarkb> I'll try to catch kopecmartin and point them to ^ 19:21:05 <clarkb> and then we can take it from there 19:21:10 <ianw> ++ 19:21:25 <clarkb> fungi: ianw: I also saw that ansible was reenabled on some afs nodes 19:21:32 <clarkb> any updates on that to go over? 19:22:40 <fungi> i think it's all caught up, now we can focus on ubuntu upgrades on those 19:22:45 <ianw> that was a small problem i created that fungi fixed :) 19:23:04 <ianw> yep, trying some in-place focal upgrades is now pretty much top of my todo 19:23:05 <fungi> more like a minor oversight in the massive volume of work you completed to get all that done 19:23:25 <clarkb> ++ and thanks for the followup there fungi 19:23:36 <clarkb> Any other config management items to cover? 19:24:23 <ianw> semi related is 19:24:25 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/775546 19:24:31 <ianw> to upgrade grafana, just to keep in sync 19:24:58 <clarkb> looks like an easy review 19:25:48 <clarkb> #topic General Topics 19:26:00 <clarkb> We just went over afs so we can skip to bup and borg backups 19:26:05 <clarkb> #topic Bup and Borg Backups 19:26:13 <clarkb> wiki has been assimilated 19:26:54 <fungi> resistance was substantial, but eventually futile 19:27:02 <clarkb> any other updates? should we consider removing this from topic from our meetings? 19:27:52 <ianw> umm maybe keep it for one more week as i clean it up 19:28:00 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/766630 19:28:09 <ianw> would be good to look at, which removes bup things 19:28:14 <clarkb> ok 19:28:32 <ianw> i left removing the cron jobs just as a manual task, it's easy enough to just delete them 19:29:00 <clarkb> sounds good 19:30:11 <clarkb> #topic Enable Xenial to Bionic/Focal system upgrades 19:30:19 <clarkb> #link https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades Start capturing TODO list here 19:30:48 <clarkb> please add additional info on todo items there. I add them as I come across them (though have many other distractions too) 19:31:13 <clarkb> I also intend to start looking at zuul, nodepool, and zookeeper os upgrades as soon as the zuul release settles 19:31:34 <clarkb> I'm hopeful we can largely just roll through those by adding new servers, and removing old ones 19:31:41 <clarkb> the zuul scheduler being the exception there 19:32:11 <fungi> if we were already on zuul v5... ; 19:32:13 <fungi> ;) 19:32:27 <clarkb> if others have time to start looking at other services (I know ianw has talking about looking at review, thanks) that would be much appreciated 19:33:37 <clarkb> #topic opendev.org not reachable via IPv6 from some ISPs 19:33:51 <clarkb> frickler put this item on the agenda. frickler are you around to talk about it? If not I'll do my best 19:34:02 <frickler> yeah so I brought this up mainly to add some nagging toward mnaser 19:34:17 <frickler> or maybe find some other contact at vexxhost 19:34:48 <frickler> the issue is that the IPv6 prefix vexxhost is using is not properly registered, so some ISPs (like mine) are not routing it 19:34:56 <clarkb> noonedeadpunk is another contact there 19:35:30 <frickler> oh, great, I can try that 19:35:45 <fungi> it's specifically about how the routes are being announced in bgp, right? 19:36:20 <frickler> the issue is in the route registry, which provider use to filter bgp announcements 19:36:40 <fungi> usually the way we dealt with it in $past_life was to also announce our aggregates from all borders 19:36:44 <frickler> they registered only a /32, but announce multiple /48s instead 19:36:57 <clarkb> I see so its a separate record that routers will check against to ensure they don't accept bad bgp advertisements? 19:37:26 <fungi> so you announce the /32 to all your peers but also the individual /48 prefixes or whatever from the gateways which can route for them best 19:37:56 <frickler> vexxhost only needs to create route objects for the individual /48s matching what they announce via bgp 19:38:07 <fungi> and yes, there is basically a running list maintained by the address registries which says which prefix lengths to expect 19:38:40 <fungi> out of what ranges 19:39:35 <frickler> the prefix opendev.org is in is 2604:e100:3::/48, which is what they announce via their upstreams 19:39:51 <fungi> and operators wishing to optimize their table sizes use that list to implement filters 19:39:52 <frickler> but a route object only exists for 2604:e100::/32 19:40:14 <frickler> no, that's not about table size, it is general bgp sanity 19:40:31 <frickler> except not too many providers care about that 19:40:42 <frickler> but I expect that to change in the future 19:40:50 <fungi> the main sanity they care about is "will the table overrun my allocated memory in some routers" 19:41:18 <fungi> (and it's no fun when your border routers start crashing and rebooting in a loop as soon as they peer, let me tell you) 19:41:29 <frickler> this is more related to the possibitly of route hijacking 19:42:05 <clarkb> frickler: whee does this registry live? arin (those IPs are hosted in the USA iirc) 19:42:09 <fungi> yeah, but that possibility exists with or without tat filter list, and affects v4 as well 19:42:35 <frickler> in that case it would be arin maybe, though the /32 is registered in radb 19:42:46 <clarkb> (mostly just curious, I know we can't update it for them) 19:43:05 <frickler> I don't know all the details for american networks, in europe it would be RIPE 19:43:40 <clarkb> ok, in any case I would see if noonedeadpunk can help 19:43:59 <ianw> (ftp://ftp.radb.net/radb/dbase/level3.db.gz contains a large amount of ascii art of cartoon characters, which is ... interesting) 19:44:33 <clarkb> anything else on this topic? 19:45:06 <frickler> no, fine for me 19:45:18 <clarkb> #topic Open Discussion 19:45:21 <clarkb> Anything else? 19:46:13 <fungi> yeah, the individual lirs make and (generally) publish their allocation policies indicating what size allocations they're making from what ranges 19:46:28 <fungi> they tend to expect you to at least have aggregates announced for those 19:47:09 <fungi> er, s/lirs/rirs/ 19:48:29 <clarkb> sounds like that may be it? 19:48:34 <clarkb> I'll give it another couple of minutes 19:49:51 <fungi> you find recommendations like "route-filter 2600::/12 prefix-length-range /19-/32;" in old lists, e.g. https://www.space.net/~gert/RIPE/ipv6-filters.html 19:50:27 <fungi> that's the /12 which covers our address, and the recommendation is to only accept prefixes between /19 and /32 long in it 19:51:08 <clarkb> and sounds like that may be it, thanks everyone. 19:51:11 <fungi> so if a provider is using a filter like that, they'll discard the /48 routes vexxhost is announcing 19:51:14 <clarkb> we can continue the ipv6 discussion in #opendev 19:51:17 <fungi> thanks clarkb! 19:51:20 <clarkb> #endmeeting