Monday, 2020-10-26

ianwok, back to where i was last wednesday, i think borg backups are fixed on ethercalc now, so we can monitor that for the next bit00:04
fungiclarkb: looks like our etherpad and videoconference links are still at their defaults, i'll update them to meetpad with our planning pad now12:52
clarkbfungi: thanks12:56
*** sshnaidm has joined #opendev12:59
clarkbI've cleaned up one more stale db backup for etherpad16:50
clarkbthere is one remaining which I'll do tomorrow after tonight's backups run16:51
cloudnullo/ - looks like Fedora 33 has removed the ssh-rsa w/ SHA1 which makes it so folks, like myself, aren't able to "git-review" without updating ssh config, I upgraded last Friday.18:39
cloudnullthere's a BZ for it - - however, I suspect that's just the way it is. Any chance this is known and we're implementing git servers with SHA2 sometime soon?18:39 bug 1884920 in openssh "Cannot ssh into CentOS 6 using ssh key authentication" [Low,New] - Assigned to jjelen18:39
cloudnullfigured I'd drop a line here, given I suspect folks will be running into this soon, given F33 is imminent.18:39
clarkbcloudnull: so there are a few things about that. First is that openssh hasn't dropped it directly just deprecated it. Fedora has decided to take a stricter approach. Second is you can enable sha1 on specific hosts which is what I think we're currently suggesting to f33 users18:40
clarkbthe last thing is that yes we've been working through a gerrit upgrade whcih will address the problem on the server side18:40
clarkbbut that got interrupted last week and we need to pick things back up again after the ptg and see if our previous upgrade planning still makes sense18:41
cloudnullyup, I commented in the BZ w/ the specific config I'm using which re-enables SHA1 for - just took me a bit to figure out why git-review was failing.18:42
clarkbwe haev successfully tested an upgrade of gerrit to 3.2 using a snapshot of production from october 1, but we want to rerun through that and we also have to take a lengthy outage to do the notedb conversion18:42
clarkblargely a matter of being confident in he upgrade process and then finding a reasonable time to take a long outage18:43
cloudnullI suspect there will be a good influx of similar queries tomorrow - when F33 is official18:43
clarkband in 6 months everyone will be wondering why they have no disk but df shows plenty of free space :P18:45
clarkbthen everyone will learn about btrfs balancing18:45
clarkbanyway an upgrade will hopefully happen soon for some value of soon. Just need to get through the PTG first and test another upgrade18:45
cloudnullas a former OpenSUSE user that one wont be a head scratcher for me :P18:46
cloudnullthanks clarkb, just figured I'd drop a line about it.18:46
clarkbone thought i had this morning was to announce the november 20-22 outage window under the assumption we'll rerun testing and be happy with it then simply push it back if we feel we aren't ready18:47
clarkbinfra-root ^ we can discuss this more in our 23:00-01:00 block but I'd be curious to hear what others think of that18:49
fungisounds fine to me, though i would like to entertain mnaser's suggestion in case it allows us to fit the entire upgrade into a single day18:58
clarkbya trying that next week really quickly might be good next step18:58
clarkbI doubt we'll make much headway on the testing side of things this week with the ptg18:58
clarkbmy concern with that is we'd have to add a bunch of steps to copy state first (we're already going to backup things so thats probably not going to add significant overhead, more just another thing to get wrong)19:00
clarkbbut we can test it and see how painful it is19:00
fungii'm happy to help/drive gerrit upgrade test repeat next week19:04
clarkbthat would be great. I imagine I can help/drive too but I think the more eyeballs we have on the process the more likely we are to catch problems early19:05
fungiagreed, getting more folks involved would be awesome19:05
corvusfungi, clarkb: what was mnaser's suggestion?19:16
clarkbcorvus: he suggseted taht we could boot a surrogate gerrit in vexxhost on one of their new fast flavors with raid 0 ssds and use that to do the notedb migration19:17
clarkb(perhaps even the rest of the upgrade too) then copy that back into place on the production node19:17
clarkbwe would need to test that that is actually after, but if it is may represent a good way to reduce outage window time19:18
clarkbstarting to sketch out what an upgrade announcement might look like at
fungiyeah, i think for the vexxhost experiment, we should basically copy our initial state there, then repeat all our upgrade steps and time each one19:33
fungialso time the data transfer into and out of vexxhost obviously since that will add possibly a little deficit to the total outage19:33
corvusrsync should be effective to reduce that on the git repos19:38
corvusprobably won't make a big deal with a db dump19:38
fungiagreed, we don't need to add any initial rsync for git repos into the final total, just a nearly-empty last rsync after stopping the service19:39
fungihowever given the aggressive git gc performed (more than once) during upgrades, i doubt the rsync back out will save much time over a full copy19:40
fungibut i'd probably use rsync again in that direction too just for consistency19:40
fungiwe also might want to disable periodic gc/repacking once we start the first rsync so that we don't create more work for ourselves19:41
ianwCPU usage has again spiked.  i'm seeing occasional 502 responses21:23
ianw(in the logs)21:24
ianwit's another google cloud hosted ip walking all changes in the same way we've seen before21:26
ianwi've blocked it in iptables21:30
clarkbfwiw I think it was ttx that suggested it may be bitergia, we may want to reach out to them and double check?21:44
clarkbanother idea I threw out there was that we could see if we can force auth for access then we would get contact info?21:45
clarkbinfra-root thoughts on an infra meeting tomorrow? are we wanting to do that or are we swamped with PTG?21:47
fungii'm not so sure we need the weekly meeting when we also have ptg (though i technically don't have any conflicts for that hour)21:49
ianwI could file something in the google abuse form?  i can say that rather than blocking their queries we'd rather work with the owners and give some contact details21:53
clarkbianw: ya if it isn't bitergia that may be the next step. It seems is hosted on google so wouldn't be surprised if it is them21:54
ianwdo we have a contact, other than like a form on the webpage?21:55
clarkbianw: I think ttx does?21:55
ianwthere is a person on twitter who says they work for them who is followed by a lot of openstacky-techy people i follow22:00
ianwoh, haha, that's the ceo22:01
toskyI know can anyone please band that annoying troll on #openstack-meeting-alt?22:10
toskyand possibly ban the username, which has been reused despite being connected through a web IRC client22:10
clarkblet me see, I'm not in that channel so don't have scrollback22:10
ianwi'm in there and just kicked them22:13
fungireminder to our irc patrolling volunteers, our suggestions for dealing with similar cases are here:
corvusianw is turing test proctor22:15
fungii'll be a couple minutes late, the kettle's not up to a boil quite yet23:01
ianwlife changing23:04
fungiyeah. i've been tempted but i don't drink tea all that often23:05
