Wednesday, 2022-01-12

kevinzclarkb:fungi: That issue was due to one controller node was not working properly.  And I've fixed it. Sorry for the inconvenience.00:53
kevinzplease let me know if there is any futher issue.00:54
fungioh, thanks kevinz!01:38
kevinzfungi: welcome01:50
opendevreviewJack Morgan proposed opendev/system-config master: Minor update to documentation.
jentoioI found a minor issue with the documentation so using it as a practice for gerrit.01:55
fungithanks, i approved it02:28
opendevreviewMerged opendev/system-config master: Minor update to documentation.
fungijentoio: ^ it's merged, thanks!03:01
jentoiofungi: great, thanks. I feel legit now ;)03:02
fungias you should03:02
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Updating gitignore to ignore local config files
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add Backport-Candidate label to openstack-ansible ACL
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add Backport-Candidate label to openstack-ansible ACL
fricklerthe announcement of gerritbot "verification failed" messages has some false positives when rechecking a patch and there are results delivered from a non-voting pipeline like arm64, the latter leads to the message being repeated, see e.g. vs. 13:04
opendevreviewShnaidman Sagi (Sergey) proposed zuul/zuul-jobs master: Include podman installation with molecule
rlandy|ruckclarkb: fungi: hello ... has a bunch of patches that have jobs queued for 5 or 6 hours  - with no jobs started15:59
rlandy|ruckopenstack/tripleo-validations seem to be queuing up for some time16:00
fungithanks, looking16:11
clarkbI'm going to work on hrw's account cleanup. My plan is to double check both accounts to make sure there isn't some unexpected conflict, then assuming it is as expected retire the new account and delete its conflicting external ids16:24
fungirlandy|ruck: based on our utilization graphs we've been topped out at our maximum test capacity since roughly 08:00 utc. i definitely see changes getting nodes assigned, but keep in mind that zuul's "fair queuing" algorithm attempts to prevent projects from monopolizing available test resources by reducing their node request priority if they've already enqueued another change recently. so the16:24
fungimore changes request testing for a project, the longer those changes will take to get nodes assigned16:24
clarkbif any other infra-root would like to be walked through this process instead (its a good interacting with gerrit as admin and the new account system exercise) let me know. You have a little bit as I settle in and catch up on email before I start :)16:25
rlandy|ruckfungi: k, make sense - I was just checking that we were not hanging on jobs16:25
clarkbfungi: that was my take on it looking at the status page too16:25
fungirlandy|ruck: it appears the node request backlog peaked around 14:00 utc and has been rapidly falling, so unless there's another spike in node requests i expect the current burn rate to get us back to starting builds on-demand within the next few hours16:25
rlandy|ruckif they start when capacity allows, we're ok16:25
rlandy|rucksounds good - thank you16:26
fungirlandy|ruck: if you're interested, graphs are here:
fungithe most interesting ones in this case are probably "node requests" and "test nodes"16:26
rlandy|ruckright - that is useful, it makes it easier to estimate if things are getting better16:27
fungiyeah, i'm mostly looking at the node requests backlog there to attempt to predict when this should clear up16:27
fungiwhen we're not maxxed out, that graph should normally read ~0 with the occasional blip16:28
rlandy|ruckthanks - next time I'll know where to look16:29
fungithe other indication is that the used line in the test nodes graph is consistently plateaued, indicating our effective maximum node capacity (the "max" line is misleading, it's a theoretical capacity based on max-servers values but our providers often set lower quotas or we end up with leaked undeletable cruft they have to clean up manually in some cases, but also not all flavors use the same16:31
fungiamount of ram/cpu/etc)16:31
clarkbhrw's account setup seems pretty straightfoward. The email hrw requested to add to the old account is indeed the email associated with the new accounts openid so can't be removed by the user. This means the previously described plan should work fine. We retire the old account then delete its external ids containing the conflicting email. This includes the openid. Then hrw can add16:42
clarkbthat email address manually when logged into the old account16:42
clarkbI'm going to proceed with that. Might take me a bit to page in how these cleanup scripts work, but shouldn't be any trouble once I'm back up to speed16:42
clarkbthe first step in the process has been completed16:47
clarkbnow I need to run the script to clean out the unwanted external ids for that account16:48
kopecmartinclarkb: you may go ahead with
clarkbkopecmartin: thank you for checking. I'll get to that next16:55
clarkbthe account modifications are done, I'm just recording my work in the usual location now16:55
clarkband removing my extra perms16:56
clarkbinfra-root ok I should be all done with the hrw account cleanup. hrw will need to manually add the email addr to the older account though as I didn't manually do that16:58
clarkbfungi: frickler: any objections to approving the refstack image update now? I think frickler's rereview might be the most important thing at this point if that is still a possibility17:00
opendevreviewClark Boylan proposed opendev/bindep master: Replace centos-8 with centos-8-stream
clarkbfungi: frickler ^ also thank you for calling out the weird parenting there. THat was unnecessary and not desirable so I have shifted it around with a rebase17:03
clarkbwe did get new arm64 centos 8 stream images overnight so I've rechecked the ozj centos 8 cleanup change in hopes that the openafs package will build now17:08
fricklerclarkb: +2 on refstack17:09
fungiclarkb: refstack image update seems fine to approve now, yeah. i can do it as soon as i find that change again17:09
clarkbI'll get it now that frickler  +2'd17:09
fungiahh, cool17:09
clarkbfor the rename plugin I have no idea why I can't figure out the ssh command for it. Well I Mean the docs use text substituion which doesn't substitute in the repo but some rendered form that I don't know how to see so thats part of it. Anyway I'm going to pause on that for a bit while I catch up on other stuff but probably need to hold a node and inspect it more closely17:11
fricklerthe bindep change seems to have kept the +2s, so you can approve it once the checks succeed17:11
clarkbOnce I get a few of these changes out of my queue I'll feel better about doing some zuul reviews later today17:12
clarkbwoot the arm64 openafs build is working now17:19
opendevreviewMerged opendev/bindep master: Replace centos-8 with centos-8-stream
* clarkb looks for breakfast while the resfstack change gates17:37
opendevreviewMerged opendev/system-config master: Update refstack image to bullseye
fungiclarkb: 824236 passes now!18:02
clarkbfungi: ya I think it was the old arm64 centos 8 stream image causing the problem. With the new image all the packages aligned with the running kernel and we were good18:03
clarkbrefstack's container just restarted18:05
clarkbit seems to be up and I can load the front page. kopecmartin not sure if there is any other checkign you want to do18:05
clarkbfungi: do note that that change bumps up the openafs version too, but I think that should be fine (we go from a prerelease to a bugfix release of the same release)18:07
rlandy|ruckrcastillo|rover: hey18:17
rlandy|ruckclarkb: fungi: hi again ... wanted to introduce rcastillo|rover (usually just rcastillo). He's joined the Red HaT TripleO CI team and would like to get involved in some infra projects18:18
rcastillo|rovero/ nice to meet y'all18:18
rcastillo|roverwould love getting involved :)18:18
rlandy|ruckyep - so if you have any work he can start with ... please be in touch18:19
clarkbRight now the major things I'm working on are container maintenance as described by, Zuulv5 reviews (hashtag:sos), CentOS 8 image removal now that it is EOL, and then supporting ianw's Gerrit 3.4 upgrade efforst and fungi's mailman 3 upgrade efforts18:20
clarkbIf you're interested in the dedicated container users effort described in the container maintenance doc I think there is enough room there for us to split that up. jentoio will be helping with that too likely18:21
fungiwelcome rcastillo! aside from general user support, my broader focus over the next few months is going to be split between a couple of specs for improving our services:
fungiif you're interested in helping with either of those, let me know18:33
rcastillo|roverfungi: thanks! I'll take a look at both of those, the auth proposal interests me for sure18:47
fungircastillo|rover: on that front, we have a poc keycloak deployment already we've been testing, so there's some progress on it18:51
fungi#status log Restarted statusbot and gerritbot as they did not seem to gracefully cope with an apparent netsplit we experienced around 18:30 UTC18:53
opendevstatusfungi: finished logging18:53
fungiinfra-root: we got a ticket from rackspace to let us know they had to reboot the ethercalc server due to hypervisor host issues. it seems to be up and running fine so i'm going to close out the ticket they opened for it18:54
clarkb++ and thank you for following up on that too :)18:54
fungi#status log The ethercalc server was rebooted at 11:17 UTC due to a hypervisor host problem in our donor provider18:57
opendevstatusfungi: finished logging18:57
fungialso ianw's ticket about being unable to connect to the emergency console from fedora 33 with its default security settings was closed out claiming to be solved, though i'm not in a position to be able to test it19:29
clarkbI don't think we have fedora 33 instances anymore either19:30
fungiright, that's a big part of why i'm not in a position to be able to test it19:33
clarkb is still open against uwsgi which means the hack in is still our best bet for now19:38
clarkb821339 is our next step in bullseye updates for containers. Should I single core approve that? frickler  you might have time to take a look? (its late for you though and this can probably wait for tomorrow)19:38
*** dviroel is now known as dviroel|afk20:04
fungi rlandy|ruck we caught up on the node request backlog in the last few minutes, so in theory you should have nodes assigned to all those builds now20:31
rlandy|ruckfungi: yep thanks - we seems good to go now20:32
rlandy|ruckhad a small panic this morning :) - but all the jobs at through now20:32
