Tuesday, 2020-09-22

*** hamalq has quit IRC01:01
*** hamalq has joined #opendev-meeting02:25
*** hamalq has quit IRC06:11
*** diablo_rojo has quit IRC06:34
*** hashar has joined #opendev-meeting06:49
*** hashar is now known as hasharAway10:08
*** hasharAway is now known as hashar10:55
*** hashar has quit IRC11:43
*** hashar has joined #opendev-meeting12:16
*** hashar has quit IRC14:29
*** hamalq has joined #opendev-meeting16:16
*** hamalq has quit IRC17:41
*** diablo_rojo has joined #opendev-meeting18:03
clarkbanyone else here for the opendev infra meeting?19:00
corvusoh that's me19:00
clarkbfungi mentioned he won't make it today19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Sep 22 19:01:09 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-September/000100.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
clarkbPTG and Summit are fast appraoching. If you plan to participate now is a good time to register19:01
ianwo/19:01
clarkbschedules for all three should be up too so you can cross check your timezone and availability19:02
clarkbunfortunately I don't currently have links ready but if you need help finding anything I'm sure I can either find the info or know who to talk to19:02
diablo_rojoo/19:02
clarkbAlso the smoke is mostly gone now and tomorrow is the last day where my kids don't have school obligations for the next several months so I'm going to take the day off and go do somethingin the rain19:03
clarkb#topic Actions from last meeting19:03
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.txt minutes from last meeting19:03
clarkbno recordred actions19:03
clarkb#topic Priority Efforts19:04
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:04
clarkb#topic Update Config Management19:04
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:04
clarkbnb04.opendev.org has been cleaned up19:04
corvusso all of zuul+nodepool are running from container images now?19:04
clarkbcorvus: yes19:05
corvusis all of puppet-openstackci unused by us now?19:05
clarkbI'm not sure if we use the elasticsearch and logstash stuff out of there or not19:05
clarkbI think not so ya that may be completely unused by us now19:06
corvusthat's a major milestone :)19:06
clarkbya, and we're also using the upstream images for all services too19:07
clarkb*all zuul and nodepool services19:07
clarkbwhcih is a good way to help ensure thoes function well for the real world19:07
*** hamalq has joined #opendev-meeting19:09
clarkb#link https://review.opendev.org/#/c/744821/ fetch sshfp dns records in launch node19:09
clarkb#link https://review.opendev.org/#/c/750049/ wait for ipv6 RAs when launching nodes19:09
clarkbthose are two launch node changes that would be good to land as we continue to roll out services with new config management on new hosts19:10
clarkbAny other config management related items to discuss?19:10
clarkb#topic OpenDev19:11
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:11
clarkbIt has been noticed that our Gitea project descriptions are not updated when we change them19:11
clarkbI believe that the current method to address this is to run the sync-gitea-projects.yaml playbook whcih will do the slow resync of everything19:12
clarkbIf that is the current method, do we need to lock ansible on bridge when doing it to prevent other things from conflicting? I think so19:13
clarkbfungi also brought up that we should think about ways to do this automatically if we can manage it somehow19:13
corvusi'm trying to remember why we don't already do that19:14
ianwhow slow is slow?19:14
clarkbI want to say it took about 4 hours to run the sync playbook last time mordred ran it19:14
clarkbbut maybe that was before we rewrote it in python?19:14
ianwok, that counts as slow :)19:15
clarkbthinking out loud here: maybe we lock ansible on bridge, run the sync playbook and time it, then based on that we can consider doing a full sync each time?19:16
clarkbalso the ci job for gitea does two passes of project creation iirc. Maybe we can look at that for rought timing data19:16
ianw++19:17
corvusit's not clear to me that sync-gitea-projects.yaml will update the descr19:17
clarkbcorvus: oh interesting is this possibly a bug in our python too?19:18
corvusclarkb: yeah (or missing feature); a quick scan of the python looks like it only touches description on create19:18
clarkbneat19:18
corvussettings and branches get updated by sync-gitea-projects.yaml, but not the description19:18
corvusexpect it would be an easy fix19:18
clarkbnow I'm thinking we make a chagne that just updates descriptions and use the ci job to time it19:18
clarkbsince I'm pretty sure we do two passes in ci now19:19
corvus(also, i'm only at 90% confidence on this)19:19
clarkbcorvus: we can also use the ci job to check your theory on that19:19
clarkbI can take a look at that if no one else is able/interested. It may just be a day or two while I work through other things first19:19
clarkband if you are interested I'm happy for someone else to work on it too :)19:20
clarkbThe other gitea topic I added to the agenda is that gitea 1.12.4 has released. I've done the last few gitea upgrades. Curious if anyone else is interested in giving it a go.19:21
clarkbWith the minor releases its mostly about double checking file diffs and editing them as necessary for our forked html content19:21
ianwi can give it a go19:22
clarkbOur CI job for that has decent coverage of the automation results, and if you really want to check the rendered web ui launching a gitea locally isn't too bad19:22
clarkbianw: thanks!19:22
clarkbOn the gerrit upgrade side of things I've been distracted by a number of operational issues the last few weeks so unfortunately no new updates here19:23
clarkbAny other OpenDev topics people would like to bring up?19:23
clarkb#topic General Topics19:25
*** openstack changes topic to "General Topics (Meeting topic: infra)"19:25
clarkb#topic Splitting puppet-else into service specific infra-prod jobs19:25
*** openstack changes topic to "Splitting puppet-else into service specific infra-prod jobs (Meeting topic: infra)"19:25
clarkbThis is something that ianw reminded me we had planned to do19:25
clarkbessentially we split the node definition(s) out of our manifests/site.pp and put them in manifests/service.pp then add new jobs to run puppet for that specific service instead of everything19:26
clarkbthe reason this is coming up is we've been having servers like elasticsearch servers crash on rax then ansible sits there waiting for them until the puppet-else job times out19:26
clarkbthis adds a lot of noise to our logs and it is hard to tell if things are working or not since they are lumped into a big basket19:26
clarkbI wonder if we should plan a sprint to make those changes and get through a number of them in a day or two19:27
ianwyeah, something else i can put on my short-term todo list19:28
clarkb(also if anyone knows how to convince ansible to timeout more quickly when ssh will never succeed that would be great too)19:28
clarkbianw: I'm happy to help too, though for me having a day or two dedicated to it would likely help, maybe ping me when you intend on working on it and I'll start later in the day and we can sift through them?19:29
ianwok; hopefully it's all pretty mechanical19:30
ianwalways surprises though :)19:30
clarkbya I think the least mechanical part will be adding testinfra tests19:30
clarkbI think that was part of mordred's original goal so that we can switch out puppet for ansible+docker and have the tests confirm everything is still happy19:30
clarkbwithout needing to replace test frameworks19:30
clarkb#topic Bup and Borg Backups19:31
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:31
clarkb#link https://review.opendev.org/741366 ready to merge when we are.19:31
clarkbkept this on the agenda as ianw mentioned I should19:31
clarkbI don't think its incredibly urgent as bup continues to work, but being better prepared for focal and beyond is a good thing too19:31
clarkbianw: anything else to add on that one?19:32
corvuswe looking for another +2, or just waiting for a babysitter?19:33
clarkbcorvus: I think waiting for a babysitter (ianw mentioned he could do it, but there have been many many distractions since)19:33
corvuswell, there's a host bringup, so a bit more work than babysitting19:33
ianwyeah, just waiting for me to have a chunk to new server, and babysit19:33
clarkbmostly I think it gets deprioritized since bup is working19:33
ianwalthough there's been some chat about alternative providers19:33
* corvus touches wood19:33
ianwdo we want to put the bup server somewhere !rax?19:34
clarkbianw: the original goal with bup was to backup to >1 provider19:34
corvusi think 2 servers in 2 providers would be great19:34
clarkbcorvus: ++19:34
ianwany preference of the options we have?19:34
corvusi'd vote for rax + 1.19:35
clarkbianw: mnaser has recently indicated he'd be happy to host more things. Backups likely make sense there due to the use of ceph too19:35
clarkb(our backups will be replicated many times)19:35
mnaseryes, i meant to reply to the email, we have a lot of capacity of storage in mtl btw19:35
corvusi'll just say that at one point we *did* have rax +1, and the +1 exited the cloud business.  i really really hope (and i certainly don't expect) that to happen again.  but having been bitten once.19:35
ianwok, sounds like vexxhost mtl19:36
mnaserplease feel free to loop me in if you need quota bumps or anything like that19:36
clarkbmnaser: thank you!19:36
corvusrax+mtl sounds great :)19:36
ianwmnaser: thanks, will do19:36
clarkb#topic PTG Planning19:38
*** openstack changes topic to "PTG Planning (Meeting topic: infra)"19:38
clarkbAs mentioned earlier now is a good time to register if you plan to attend the PTG19:38
clarkb#link https://www.openstack.org/ptg/ Register for the PTG19:38
clarkb#link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here19:38
clarkbI've added a number of topics to this etherpad19:38
clarkbwe are just over a month away so now is a great time to think about what we should be talking about during our PTG times19:39
clarkbFeel free to add input on the topics I've added or add your own19:39
clarkbif a particular topic is very important to you it might be a good thing to indicate your time availability next to the topic so we can include you19:39
clarkbAlso, I plan to use meetpad again as that worked well for us last time19:40
clarkbAny other PTG concerns or thoughts?19:41
clarkb#topic Switch fedora-latest to fedora-3219:43
*** openstack changes topic to "Switch fedora-latest to fedora-32 (Meeting topic: infra)"19:43
clarkb#link https://review.opendev.org/#/c/752744/19:43
clarkbI sent an email last week saying we'd make this change today19:43
clarkbI intend on approving the change after the meeting unless there are last minute objections19:43
clarkbI figure if anyone really really needs fedora-30 they can use fedora-30 directly as we work them off of it19:43
clarkbhoping that in the near future we'll delete the fedora-30 image entirely19:43
clarkbpart of the motivation here is that the old fedoras seem to be bitrotting with respect to ansible. Ansible isn't able to reliably manage systemd services on f31 for example19:44
clarkbGetting to the up to date fedora version seems important as a result19:44
clarkbianw: ^ any particular concerns from you on that topic? you probably do more fedora things than the rest of us19:44
ianwno, i mean we shouldn't really be using fedora-!latest in jobs, we've always said it's a rolling thing19:45
clarkbya the number of cases where fedora-30 is used explicitly is very small19:46
clarkbnodepool, ara, and dib19:46
clarkbnodepool and ara are/have being updated and dib will just stop testing f30 buidls I think19:46
clarkb#topic Open Discussion19:47
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:47
clarkbhttps://review.opendev.org/#/c/752908/ is a change I'm hoping to get review(s) from someone with a fresh perspective19:47
clarkbI've had some initial concerns but have largely come around to thinking merging it is probably the most pragmatic thing19:48
ianwone thing was restarting zuul-web to pickup the new pf4 changes that were merged19:48
clarkbhoping that someone else can take a look and double check on that19:48
clarkbI'll probably approve it by the end of my work day if no one else looks as I don't want the tripleo testing to floudner longer19:48
clarkbianw: typically those are really straightforward, you docker-compose down and docker-compose up -d in /etc/zuul-web or whatever the dir is19:49
ianwi'll check, yesterday there were unanswered questions19:49
ianwclarkb: is there a reason we don't CD deploy that?19:49
clarkbianw: if you want to do the zuul-web restart I'm around for another 5 or so hours and will happy backup if something goes wrong19:49
clarkbianw: I think because sometimes you need to restart zuul-web and scheduler together19:50
clarkbcorvus: ^ is that overly cautious on our part?19:50
ianwahh, ok, yeah this is not an API change19:50
ianwbut i guess it could be, at some times19:50
clarkbya most of the time its fine to just restart19:50
clarkboccasionally it isnt :)19:50
ianwi'll take a look then19:51
corvusi think it would be fine to cd zuul-web19:52
corvusbut the mechanics are tricky19:52
corvuszuul repo is in a different tenant, etc19:52
corvusreally want a url trigger or something for that, i'd think.19:53
clarkbwe check the docker-compose pull info in the gitea role to understand if we need to restart in a safe way (whcih is different than just down and up)19:53
ianwhrm, i docker restarted it, but it looks the same19:53
clarkbwe might be able to do something similar for zuul-web and get the hour delayed CD19:53
ianwwhich must mean what i thought would be new containers isn't19:53
corvusianw: i think it needs more than a restart for the container to be recreated with a new image19:54
clarkbyes I think that is the case19:54
corvusi think a docker-compose down/up ?19:54
clarkbya down then up -d is what I usually use19:54
ianwok, yeah, looks like that's in the bash history19:55
ianwmy usual source of best practice tips :)19:55
clarkbSounds like that may be it19:56
clarkbThank you everyone19:56
ianwyay, that got it :)19:56
clarkbwe'll be back here next week until then feel free to chat in #opendev or on service-discuss@lists.opendev.org19:56
corvusclarkb: thanks :)19:56
clarkb#endmeeting19:56
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:56
openstackMeeting ended Tue Sep 22 19:56:55 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:56
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-22-19.01.html19:56
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-22-19.01.txt19:57
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-22-19.01.log.html19:57
diablo_rojoThanks clarkb!20:00
*** gouthamr has quit IRC22:06
*** mnaser has quit IRC22:06
*** gouthamr has joined #opendev-meeting22:09
*** mnaser has joined #opendev-meeting22:10
*** gouthamr has joined #opendev-meeting22:11

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!