Tuesday, 2021-02-02

-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is being quickly restarted to apply a new security patch00:55
*** hamalq has quit IRC02:36
*** sboyron has joined #opendev-meeting07:02
*** sboyron_ has joined #opendev-meeting07:19
*** sboyron has quit IRC07:22
*** hashar has joined #opendev-meeting08:13
*** kopecmartin has quit IRC08:48
*** kopecmartin has joined #opendev-meeting08:50
*** zbr1 has joined #opendev-meeting11:15
*** zbr has quit IRC11:16
*** zbr1 is now known as zbr11:16
*** hashar is now known as hasharLunch12:29
*** hasharLunch is now known as hashar13:18
*** hashar is now known as hasharAway15:27
*** hasharAway is now known as hashar16:04
*** zbr1 has joined #opendev-meeting17:17
*** zbr has quit IRC17:18
*** zbr1 is now known as zbr17:18
*** hashar is now known as hasharDinner18:41
clarkbanyone else here for the meeting?19:01
clarkb#startmeeting infra19:01
openstackMeeting started Tue Feb  2 19:01:06 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-February/000179.html Our Agenda19:01
kopecmartinhi o/19:01
ianwo/19:02
clarkbhello kopecmartin your agenda item is near the tail end of the meeting, if that is a problem feel free to say something and we can cover it earlier (not sure what meeting timing is like for you)19:02
kopecmartinclarkb: it's fine, i'll wait :)19:02
clarkb#topic Announcements19:03
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:03
clarkbI had no announcements19:03
clarkb#topic Actions from last meeting19:03
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:03
*** sboyron_ has quit IRC19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.txt minutes from last meeting19:03
clarkbfungi's request for help on config-core duties went out19:03
clarkbfungi: thank you for that19:03
clarkb#action clarkb begin puppet -> ansible and xenial upgrade audit19:04
clarkbI did not manage to find time for ^ so have added it back on19:04
clarkbianw: do we need to keep an action item for wiki backups or are those happy now?19:04
ianwnot done yet ...19:04
clarkb#action ianw figure out borg backups for wiki19:05
clarkbOk lets dive into our topics for today19:05
clarkb#topic Priority Efforts19:05
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:05
clarkb#topic OpenDev19:05
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:05
clarkbThe service coordination nominations period has finished.19:05
clarkbI didn't see anyone else volunteer by the weekend so I put my name in19:05
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-January/000161.html Clarkb appears to be the only nomination19:05
clarkbI haven't seen any other since I made mine either.19:06
clarkbI think that means I'm it again, but if I missed something please call it out :)19:06
clarkbSince last weeks meeting I've done a bit of work on the Gerrit account inconsistencies problem19:07
clarkb#link https://etherpad.opendev.org/p/gerrit-user-consistency-2021 High level notes.19:07
clarkbI've started to try and keep high level notes there while keeping the PII out of the etherpad19:07
clarkbGroup problems and 81 accounts with preferred emails missing external ids have been fixed.19:07
clarkbthank you fungi for being an extra set of eyes while I worked through ^19:08
fungiany time!19:08
clarkbWe have 28 accounts with preferred email addresses that don't have a matching external id19:08
clarkbWe have ~642 accounts with conflicting emails in their external ids. This needs more investigating to better understand the fix for them.19:08
clarkbNeed to correct the ~642 external id issues before we can push updates to refs/meta/external-ids with Gerrit online.19:08
clarkbWorkaround is we can stop gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?)19:08
clarkbI'ev given the next set of steps some though and I think roughly it is:19:08
clarkbClassify users further into situation groups19:09
clarkbDecide on next steps for users depending on their situation group19:09
clarkbFix the preferred email issue if possible as this can be done with gerrit online19:09
clarkbStart a refs/meta/external-ids checkout in a shared location and begin committing fixes to it. If we can't push all the fixes as separate commits we can squash them together and then push.19:09
clarkbthat might be broken down further to do all the preferred email issues first as we can correct them online. Then do the external ids19:09
zbrclarkb: does this mean manually investigating and patching >600 accounts?19:09
clarkbAnother upside to doing it ^ that way is I expect some of the external id fixes will result in preferred email issues in the account side. If we fix the existing issues first we won't confuse them with any new ones we introduce19:10
clarkbzbr: yes19:10
fungiprobably semi-scripted at least19:10
fungiand with distinct classifications, some of them may be quick to blow through19:11
clarkbright the 81 we fixed already were 95% done with a script once we were satisfied with an earlier pass and classification19:11
clarkbdepending on how the classification for external ids goes I think downtime to crrect a portion of them is an option as well19:11
clarkbthat will help us ensure that we're making changes that are viable once loaded into gerrit19:12
clarkbbtu I don't want to make too strong of a plan for those until we start actually committing changes to that shared checkout19:12
zbrwhile this does not sound like a joy, if spliting the work can speed things up, i may give it a try.19:12
clarkbzbr: part of the problem here is that it is all PII so I think we need to be careful who we give access to. Currently it is just gerrit admins19:13
clarkb(you need to be admin to access the refs)19:13
zbrah.19:13
clarkbanyway, after fixing the first 81 accounts I spent a bit of time doing further classification but then got distracted19:14
clarkbI need to pick that back up again. I think that a good chunk of the remaining preferred email issues can be fixed like the first 8119:14
clarkbbut I need to actually make those lists and then see if others agree19:14
clarkbI'll be trying to pick this back up again this week19:14
clarkbOther Gerrit items:19:15
clarkbWe upgraded Gerrit to ~3.2.7 yseterday to patch a security issue19:15
clarkbI also tested that Gerrit's workinprogress state is handled by zuul properly when you approve changes. It appears to ignore workinprogress changes properly now19:15
clarkb(we expected it to since the fix was deployed, but needed to test with actual gerrit)19:16
clarkbianw and I have made some improvements to the gerrit testing too.19:16
clarkbthe selenium stuff is a bit better now and I added a test to check that the x/ clone workaround continues to work19:17
clarkb#link https://review.opendev.org/c/opendev/system-config/+/765021 Build 3.3 images19:17
clarkbI also resurrected my change to build 3.3 images. I don't think we're in a hurry to upgrade, doing the OS upgrade first seems like better prioritization, but having working image builds ready for us would be nice19:17
clarkbThat was what I had for OpenDev and Gerrit things. Anything else to add before we move on?19:18
ianwhrm that doesn't run the system-config job?19:18
clarkbianw: no, beacuse currently the system-config job is 3.2 only19:18
clarkbI think in a followup I could add a system-config + 3.3 job19:18
clarkbor if people prefer can add it to this existing change19:19
ianwoh i see totally new jobs :)  i think it would be great to run it, either way19:19
clarkbok I'll probably start with a followup change then as that is slightly easier and take it from there19:20
clarkb#topic Update Config Management19:21
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:21
clarkbI am not aware of anything to add to this other than maybe the refstack topic which we've got later on in the agenda19:21
clarkbMight be worth mentioning that I helped zuul fix an issue in zuul-registry that new buildx exposed. This was affecting our ability to do multiarch builds (things like nodepool builders)19:22
clarkbthat should be fixed now though. Thank you zbr for calling out the problem19:23
clarkb#topic General topics19:24
*** openstack changes topic to "General topics (Meeting topic: infra)"19:24
clarkb#topic OpenAFS cluster status19:24
*** openstack changes topic to "OpenAFS cluster status (Meeting topic: infra)"19:24
clarkbJust a quick status check on the openafs cluster. I think we still need to upgrade the db servers?19:24
clarkbianw: fungi: anything else to note about ^ ?19:25
fungithat's still the status afaik19:25
ianwyeah, i got distracted with other things.  high on my todo list :)19:26
fungiand then we can think through upgrading operating systems/replacing servers19:26
clarkbno worries, just making sure I (and others) are up to date19:26
clarkb#topic Bup and Borg Backups19:26
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:26
clarkb#link https://review.opendev.org/c/opendev/system-config/+/77357019:26
clarkbI think that is the latest on borg? Would be good if fungi frickler and corvus could review that one19:27
zbrclarkb: i just realized why i did not get notifications from Zuul-jobs-failures mailing list... i did not whitelist the user.19:27
clarkbianw: feel free to fill us in on any and all relevant info for this topic though :)19:27
ianwyeah, i'm pretty focused on getting our working set to a reasonable level19:28
ianwunfortunately i didn't quite fully grok the implications of --append mode and the particular way borg implements that19:28
clarkb(I didn't either)19:28
ianwall the details are in the changelog of 77357019:29
fungii caught some of it last night before i passed out19:29
ianwanyway, a better way would be do do someting like a rolling set of LVM snapshots on the server side19:30
fungii guess cow wouldn't help there because of the encryption layer19:30
fungior maybe it would, depends on if borg manages to not update most of the blocks when updating the backup19:31
ianwwe don't encrypt the backups, i think it would be ok19:31
clarkbI think cow would be fine the way we're using borg19:31
fungioh, right then19:31
ianwanyway we can discuss in the review, but yeah i would like to get this all sorted and running by itself very soon19:33
clarkb++ to getting this sorted soon. I intend on looking at it much closer this afternoon. I want to catch up on the docs and related issues19:33
clarkbAnything else or should we move on to the next item?19:33
ianwnope, move on19:34
clarkb#topic Deploy a new refstack.openstack.org server19:34
*** openstack changes topic to "Deploy a new refstack.openstack.org server (Meeting topic: infra)"19:34
clarkbkopecmartin: has updated my old change to make a refstack container19:34
clarkb#link https://review.opendev.org/c/opendev/system-config/+/70525819:34
clarkbwe think it is just about read to be landed, then we can spin up a new refstack server on bionic/focal (probably focal), make sure it works ( kopecmartin has volunteered to help with this step ), then migrate teh data from the old instance to the new one iwth a scheduled downtime19:35
clarkbI think the main thing we need help with is someone to spin up the new instance, configure dns records, and ensure that LE and ansible and all that are happy19:36
ianwhappy to help with that19:36
clarkbcool. I'm happy to keep helping too, but worried that I'm not in a great spot to drive any single effort right now (as I'm assisting a bunch)19:37
clarkbI also expect we may need to learn us a refstack in order to figure out what the migration from old to new server will look like, but I'm 99% sure that can happen once we're happy the new deployment functions at all19:37
ianwcan it run without DNS pointed to it?19:37
ianwi haven't looked but i was imagning it would be a db import?19:38
clarkbianw: you might need to edit your lcaol /etc/hosts to make everything happy but it should19:38
clarkbyes I believe it is a db import19:38
*** hasharDinner is now known as hashar19:39
clarkbkopecmartin: do you know what kind of testing you think would be appropriate here?19:39
ianwspeaking of, should it be in our backup rotation?19:39
clarkbianw: probably19:39
kopecmartinclarkb: click on every button in the UI , try to upload new results files, register a new user maybe .. this kind of things19:40
clarkbkopecmartin: got it, general usage19:40
clarkbmakes sense19:40
ianwok, we can tackle that separately.  i'll review 705258 and can try starting something19:40
clarkbI expect all that will work if you set /etc/hosts19:40
kopecmartinyeah, we dropped py2 support, so i'd like to exercise every function of the site19:40
kopecmartinluckily it's not that complex19:40
clarkbThank you ianw for helping out.19:41
clarkbAnything else on this topic?19:41
kopecmartinnot from my side, just let me know it you need anything19:41
clarkbwill do!19:41
clarkbThe next two items are on my plate and have been neglected due to other distractions. This is why I'm wary to dive into something new :/19:42
clarkb#topic Picking up steam on Puppet -> Ansible rewrites19:42
*** openstack changes topic to "Picking up steam on Puppet -> Ansible rewrites (Meeting topic: infra)"19:42
clarkbI have yet to write this etherpad, but I'm hopeful I'll get to it this week. I think it will give us good perspective and ability to prioritize effort19:42
clarkbNot really anything else to add to this. Other than thank you to everyone who has continued to push on migrating us off of puppet19:43
clarkb#topic inmotion cloud openstack as a service19:43
*** openstack changes topic to "inmotion cloud openstack as a service (Meeting topic: infra)"19:43
clarkbI'm hoping that tomorrow I can try turning this on and see what happens19:43
clarkbIf all goes well hopeflly we'll be able to expand nodepool's resource pool19:44
clarkbits been a while since I did one of these though so should be interesting to see how it goes19:44
clarkbI know they are interested in our feedback too, which always makes it easier when things are weird or not working19:45
clarkb#topic Open Discussion19:45
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:45
clarkbAnything else that didn't make it on the agenda that you'd like to bring up?19:45
fungichange in vexxhost node memory?19:45
fungisomething we probably need to keep an eye on, as folks could start merging regressions for memory use more easily19:46
ianwi missed that, did it go up or down?19:46
fungior will generally start asking why not all of our nodes have 32gb ram19:46
clarkbianw: it wen tup to 32GB of memory19:46
clarkbthe risk is that changes could merge in vexxhost that cannot merge anywhere else19:47
fungi#link https://review.opendev.org/773710 Switch to using v3-standard-8 flavors19:47
ianwahh19:48
clarkbfungi: piecing together dansmith's question in #openstack-infra and some of what was discussed in #opendev is this also thought to improve io in vexxhost?19:48
clarkbor are those separate concerns?19:48
ianwi feel like there was at some point something we did booting nodes with like kernel mem= parameters to keep them all the same19:48
ianwbut that's probably very silly, to have 32gb allocated but artifically limit to only 819:49
fungii'm not clear on whether it will improve i/o performance19:49
fungiianw: yeah, that's what i was referring to in my review comment19:49
clarkbianw: ya we did that to avoid the fear that we could merge thigns in one cloud and then break jobs in all the others19:49
fungialso we had to do it in bootloader configuration, which means applying it to all our providers19:50
clarkbback then you couldn't reboot with new kernel parameters, you can now so it would be a bit of a bandaid to do it now19:50
ianwwe could run them as static nodes and do a 1:4 reverse split :)19:51
clarkbthat is an interesting thought, static nodes seem like pain though :)19:52
clarkbmaybe converting the set to a large k8s cluster and then scheduling into that with nodepool would make sense if we found infinite time somewhere :)19:53
ianwwe'll see how the bare-metal cloud thing works out :)19:53
clarkbdefinitely worth a brainstorm to think about other ways of slicing them19:53
clarkbI'll think it through on tomorrow's bike ride :)19:53
clarkbor try to anyway, its probably going to be cold and my brain won't work19:53
ianwyeah the k8s cluster is probably actually a pretty sane thing to think about19:54
clarkbsounds like that may be all. Thank you everyone19:57
clarkb#endmeeting19:57
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:57
openstackMeeting ended Tue Feb  2 19:57:12 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:57
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.html19:57
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.txt19:57
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.log.html19:57
fungithanks clarkb!19:57
*** hashar has quit IRC21:40
*** diablo_rojo has joined #opendev-meeting22:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!