Tuesday, 2021-06-29

clarkbAnyone else here for the meeting?19:00
ianwo/19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jun 29 19:01:06 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-June/000262.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbNo real announcements other than my life is returning to normally scheduled day to day so I'll be around at typical times now19:01
clarkbThe one exception to that is Monday is apparently the observation of a holiday here19:02
fungiyes, th eone where citizens endeavor to celebrate the independence of their nation by blowing up a small piece of it19:02
diablo_rojoo/19:02
fungialways a fun occasion19:02
clarkbfungi: yup, but also this year I think we are declaring the pandemic is over here and we should remove all precautions19:02
fungiblowing up in more ways than one, in that case19:03
clarkbBut I'll be around Tuesday and we'll have a meeting as usual19:03
clarkb#topic Actions from last meeting19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-15-19.01.txt minutes from last meeting19:03
clarkbI have not brought up the ELK situation with openstack leadership yet. diablo_rojo fyi I intend on doing that when I find time in the near future. Mostly just to plan out what we are doing next as far as wind down goes19:04
clarkb#action clarkb Followup with OpenStack on ELK retirement19:04
clarkbianw: have ppc packages been cleaned up from centos mirrors?19:04
fungithough they're presenting a call to action for it to the board of directors tomorrow19:04
diablo_rojomakes sense to me. 19:05
fungi(the elastic recheck support request, i mean)19:05
clarkbfungi: yup, I don't think we have to say "its turning off tomorrow" more of a we are doing these things you are doing those things, when is a reasonable time to say its dead or not19:05
clarkband start to create the longer term expectations19:05
ianwclarkb: yep, that was done https://review.opendev.org/c/opendev/system-config/+/79736519:05
clarkbianw: excellent thanks!19:06
clarkband I don't think a prometheus replacement for cacti spec has been written yet either. I'm mostly keeping this on the list because i think it is a good idea and keeping it visible can only help make it happen :)19:06
clarkb#action someone write spec to replace Cacti with Prometheus19:06
fungialso, while it didn't get flagged as an action item it effectively was one:19:07
fungi#link https://review.opendev.org/797990 Stop updating Gerrit RDBMS for repo renames19:07
funginow i can stop forgetting to remember to do that19:07
clarkbfungi: great, I'll have to give that a review (I've been on a review push myself the last few days trying to catch up on all the awesome work everyone has been doing)19:07
clarkb#topic Topics19:08
clarkb#topic Eavesdrop and Limnoria19:08
clarkbWe discovered there was a bug in the channel log conversion from raw text logs to html that may have explained the lag people noticed in those files19:08
clarkbbasically we ran the conversion once an hour instead of every 15 minutes. Fungi wrote a fix for that.19:09
fungiand it merged19:09
fungiso should be back to behavnig normally now19:09
fungibehaving19:09
clarkbWould be good to keep an eye out for any new reports of lag in those logs, but I think we can call it fixed now based on what we saw timestamp wise yesterday19:09
clarkb++19:09
ianwsorry about that, missed a * from the old job :/19:09
fungithat was the new lag, by the way, the old lag before that was related to flushing files19:10
fungiso we actually had two lag sources playing off one another19:10
clarkbah cool I wasn't sure if we saw lag in the text files previously or only html19:10
clarkbtext files were happy yseterday it seemed like when we looked at least and then we fixed the html side19:10
clarkb#topic Gerrit Account Cleanup19:11
clarkbI'm hoping to find time for this among everything else and deactivate those accounts whose external ids we'll delete later19:11
clarkbfungi: you started to look at that more closely, have you had a chance to do a sufficient sampling to be comfortable with the list?19:12
fungiyes, my spot-checking didn't turn up any concerns19:13
clarkbgreat, I'll try to pencil this in for the end of the week then and do the account retirement/deactivation then in a few weeks we can do the external id deletions for all those that don't complain (and none should)19:13
clarkb#topic Review Upgrade19:14
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-2021 Upgrade Checklist19:14
clarkbThe agenda says this document is ready for review. infra-root please take a look at it19:14
clarkbianw: does the ipv6 problem that recently happened put a pause on this while we sort that out?19:14
ianwi'm not sure, i rebooted the host and it came back19:15
ianwi know there was some known issue at some point that required a reboot prior, it might have been broken then19:16
clarkbianw: the issue that hapepned on the cloud side?19:16
clarkbConsidering that we do have the otpion of removing the AAAA record from DNS temporarily if necessary I suspect this isn't critical. But others may feel more strongly about ipv619:16
fungithere was a host outage and reboot/migration forced at one point, but i don't recall how long ago19:17
fungiand probably didn't track it closely since the server was not yet in production19:17
ianwright, that feels like the sort of thing that duplicate addresses might pop up in19:17
clarkbit happened the weekend after I did all those focal reboots19:17
clarkbI remember because I delayed review02s reboot and then vexxhost took care of it for me :)19:17
fungiahh, right, mnaser let us know about it, could find a more precise time in irc logs19:18
clarkband ya that seems like a possibility if there was a migration with two instances out there fighting over arp19:18
clarkb(or even just not properly flushing the router's tables first)19:18
fungiwell, dad operates on the server seeing evidence of a conflict19:19
fungiso presumably there really were two systems trying to use the same v6 address at the same moment19:19
clarkbgot it19:19
ianwanyway, if we can work on that checklist, i'm happy to maybe do this on a .au monday morning.  that's usually a very quiet time19:19
ianwi'm not sure if we could be ready for the 5th, but that would be even quieter19:20
fungiwill do, thanks!19:20
clarkbianw: yup I'll need to add that to my list of reviews for today. And I can do .au morning as well usually. Since that overlaps with my afternoon/evening without too much pain19:20
clarkbianw: I think your suggested date of the 19th is probably reasonable19:20
clarkbthat way we can announce it with a couple of weeks of notice too (so that firewall rules can be updates in various places if necessary)19:20
clarkbmaybe plan to send that out in a couple of days after we have a chance to double check your checklist19:21
ianwthe 12th maybe too, although i'll be out a day or two before that (still deciding on plans wrt. lockdowns, etc.)19:21
clarkbI like giving a bit of notice for this and 19th feels like a good balance between too little and too much19:22
clarkbinfra-root ^ feel free to weigh in though19:22
fungii the past we've announced the new ip addresses somewhat in advance19:23
clarkbyes in the past we've tried to do ~4 weeks iirc19:23
fungisince a number of companies maintain firewall exceptions allowing their employees or ci systems to connect19:23
clarkbbut we also had more companies with strict firewall rules than we have today (or at least they don't complain as much anymore)19:23
ianwok, i can construct a notification for that soon then, as i don't see any reason we'll change the ip and reverse dns is setup too19:23
fungiright, i do think 4 weeks is probably excessive today19:24
fungibut if we can give them a heads up, sooner would be better than later19:24
clarkb++19:24
clarkbwe could even advertise the new IPs with a no sooner than X date19:24
clarkbthen they can add firwall rules and keep the old one in place until we do the switch19:25
clarkbbut the 19th seems like a good option to me.19:25
clarkbshould cross check against release schedules for various projects but I think that is relatively quiet time19:25
clarkbAnything else on the review upgrade topic?19:25
ianwnot really, i just want to get the checklist as detailed as possible19:26
fungii got nothin'19:26
fungithanks for organizing this, ianw!19:26
clarkb#topic Listserv upgrades19:26
clarkb++ thanks!19:26
clarkbI've somewhat stalled out on this and worry I've got a number of other tasks that are just as or more important fighting for time19:27
clarkbIf anyone else wants to boot hte test node and run through an upgrade on it I've already started notes on an etherpad somewhere I should dig up again. But if not I'll keep this on my list and try to get to it when I can19:27
clarkbMostly this is a heads up that I'm probably not getting to it this week. Hopeflly next19:28
clarkb#topic Draft matrix spec19:28
clarkb#link https://review.opendev.org/796156 Draft matrix spec19:28
clarkbI reached out to EMS (element matrix services) today through a contact that corvus had19:28
clarkbTheir day was largely already over but they said they will try to scheduel a call with me tomorrow.19:29
clarkbI suspect that corvus would be itnerested in bneing on that call. Is anyone else interested too? We'll be overlapping with pacific timezone and europe so the window for that isn't very large19:29
corvusthanks!  i'm hoping we can narrow the options down and revise the spec with something more concrete there19:30
clarkbI suspect this intiial conversation will be super high level and not incredibly important for everyone to be on. But I'm happy to include others if there is interest19:30
clarkbcorvus: ++19:30
fungii can be on the call, but am happy to entrust the discussion to the two of you19:31
clarkbalright I'll see what they say schedule wise tomorrow19:32
clarkb#topic gitea01 backups19:32
clarkbNot sure if anyone has looked into this yet but gitea01 seems to be failign to backup to one of our two backup targets19:32
ianwis it somewhat random?19:32
clarkbThought I would bring it up here to ensure it wasn't forgotten. I don't think this is super urgent as we haven't made any recent project renames (which would update the db tables that we want to backup)19:33
fungii haven't checked the logs, just noticed the notifications to the root inbox19:33
clarkbianw: no it seems to happen consistently each day19:33
fungiseems like it's consistently every day19:33
clarkbthe consistency is why I believe only one backup target is affected19:33
clarkb(otherwise we'd see multiple timestamps?)19:33
ianwi'm sure it's mysql dropping right?19:33
fungiappears to have started on 2021-06-1219:33
clarkbianw: I havne't even dug in that far, but probably a good guess19:34
mordredclarkb: (sorry, I'd also love to be on the matrix call, but obviously don't block on me)19:34
ianwhttp://paste.openstack.org/show/807046/19:34
fungisocket timeouts maybe?19:35
clarkbmordred: noted19:35
fungii wonder if the connection goes idle waiting on the query to complete19:35
ianwbut only to the vexxhost backup19:35
fungiwhich implies some router in that path dropping state prematurely19:36
fungior nat if we're doing a vip19:36
ianwand this runs in vexxhost, right?  so the external further-away rax backup is working19:36
clarkbyup gitea01 is in sjc vexxhost19:36
clarkband the mysql is localhost19:36
fungioh, it's vexx-to-vexx dropping? hmm... yeah that's strange19:37
fungiand same region presumably19:37
ianw64 bytes from 2604:e100:1:0:f816:3eff:fe83:a5e5 (2604:e100:1:0:f816:3eff:fe83:a5e5): icmp_seq=6 ttl=47 time=72.0 ms19:37
ianw64 bytes from backup01.ord.rax.opendev.org (2001:4801:7825:103:be76:4eff:fe10:1b1): icmp_seq=3 ttl=52 time=49.9 ms19:37
ianwthe ping to rax seems lower19:37
fungialso surprising19:37
clarkbif the backup server is in montreal then that would make sense19:38
clarkbsince ord is slightly closer to sjc than montreal19:38
clarkbanyway we don't have to do live debugging in the meeting. I just wanted to bring it up as a not super urgent issue but one that should probably be addressed19:38
clarkb(the db backups in both sites should be complete until we do a project rename)19:39
fungii thought he was saying tat higher rtt was locally within vexxhost19:39
fungibut yeah, we can dig into it after the meeting19:39
clarkbas it is project renames that update the redirects whcih live in the db19:39
ianwthis streams the output of mysqldump directly to the server19:39
clarkb#topic Scheduling Project Renames19:39
ianwso if anyone knows any timeout options for that, let me know :)\19:40
clarkbLets move on and then we can discuss further at the end or eat lunch/breakfast/dinner :)19:40
fungiin theory we can "just do it" now that the rename playbook no longer tries to update the nonexistent mysql db19:40
clarkbFor project renames do we want to try and incorporate that into the server move? My preference would be that maybe we do the renames the week after once we're settled into the new server and not try to overdo it19:40
fungii don't think we had any other pending blockers besides actual scheculing anyway19:40
clarkbfungi: linked one of the changes we need to do renames19:40
clarkb#link https://review.opendev.org/c/opendev/system-config/+/797990/19:41
fungiyeah, once that merges i mean19:41
clarkbAnyone have a concern with doing the renames a week after the move?19:41
clarkbThat should probably be enough time to be settled in on the new server and if not we can always reschedule19:42
ianw++19:42
fungiwfm19:42
clarkbbut that gives us a time frame to tell people to get their requests in for19:42
clarkbgreat19:42
fungiand also a window to do any non-urgent post-move config tweaks19:42
clarkb++19:42
fungiin case we spot things which need adjusting19:42
clarkb#topic Open Discussion19:43
clarkbAnything else to bring up?19:44
diablo_rojoI think I have the container mostly setup for the ptgbot?19:44
clarkbdiablo_rojo: oh cool are there changes that need review?19:44
diablo_rojooh. failing zuul though.19:44
fungion the oftc migration wrap-up, i have an infra manual change which needs reviewing:19:45
diablo_rojoclarkb, just the one kinda? I havent written the role yet for it. Started with setting up the container19:45
fungi#link https://review.opendev.org/797531 Switch docs from referencing Freenode to OFTC19:45
clarkbdiablo_rojo: have a link?19:45
diablo_rojohttps://review.opendev.org/c/openstack/ptgbot/+/79802519:45
clarkbgreat I'll try to take a look at that change too. Feel free to reach out about the failures too19:46
clarkbfungi: that looks like a good one to get in ASAP to avoid any additional confusion that may be causing19:47
fungithere was some discussion between other reviewers about adjustments, so more feedback around those for preferences would be appreciated19:47
ianwdiablo_rojo: i think you've got an openstack that hsould be an opendev at first glance : FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/src/openstack.org/opendev/ptgbot'19:48
diablo_rojoOh  I thought I had that as opendev originally. 19:48
diablo_rojoI can change that back19:48
ianwi think it has a high chance of working with that19:49
diablo_rojoSweet. 19:49
diablo_rojoWill do that now. 19:49
ianwspeaking of building images for external projects19:49
ianw#link https://review.opendev.org/c/openstack/project-config/+/79841319:49
ianwis there a reason lodgeit isn't in openstack?  i can't reference it's image build jobs from system-config jobs, so can't do a speculative build of the image19:50
fungiyeah, the ptgbot repo is openstack/ptgbot19:50
fungithe puppet-ptgbot repo we'll be retiring is opendev/puppet-ptgbot19:50
fungidifferent namespaces19:50
ianwyeah, i think "opendev.org/openstack/ptgbot" is the path19:50
clarkbianw: no I think it was one of the very first moves out to opendev and we probably just figured it was fine to be completely separate19:50
clarkbianw: we've learned soem stuff since then19:50
ianwok, if we could add it with that review that would be helpful :)19:51
ianw#link https://review.opendev.org/c/opendev/system-config/+/79840019:51
clarkbianw: you may need a null include for that repo though19:51
clarkbianw: since its jobs are expected to be handled in the opendev tenant19:51
clarkbinclude: [] is what we do for gerrit just above in your change19:51
clarkbcorvus: ^ can probably confirm that19:51
fungiyeah, i think the expectation was that the rest would be moving to the opendev tenant in time, and then we could interlink them19:52
ianwis working https://104.130.239.208/ is a held node19:52
fungii've managed to move some more leaf repos into the opendev tenant, but things heavily integrated with system-config or openstack/project-config are harder19:52
ianwbut there is some sort of db timeout weirdness.  when you submit, you can see in the network window it gets redirected to the new paste but then it seems to take 60s for the query to return19:53
ianwi'm not yet sure if it's my janky hand-crafted server there or somethign systematic19:53
ianwsuggestions welcome19:53
clarkbianw: if you hack /etc/hosts locally wouldn't that avoid any redirect problems?19:54
clarkbmight help isolate things a bit. But I doubt that is a solution19:54
ianwi don't think it is name resolution; it really seems like the db, or something in sqlalchemy, takes that long to return19:55
ianwbut then it does, and further queries work fine19:55
clarkbit only happens the first time?19:55
clarkbWe are just about at time. I need lunch and then I have a large stack of changes and etherpads to review :) Thank you everyone! We'll be back here same time and place next week. As always feel free to reach out to us anytime on the mailing list of in #opendev19:56
ianwwhen you paste a new ... paste.  anyway, yeah, chat in #opendev19:56
fungithanks clarkb!19:57
clarkbya sorry, realized we should move along (not going to lie in part because I am now very hungry :) )19:57
clarkb#endmeeting19:57
opendevmeetMeeting ended Tue Jun 29 19:57:29 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:57
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-29-19.01.html19:57
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-29-19.01.txt19:57
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-29-19.01.log.html19:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!