Monday, 2022-07-18

ianwi'm running the rolling restart playbook limited to the executors in a root screen01:25
ianw#status log pruned backups on
opendevstatusianw: finished logging02:15
opendevreviewMerged zuul/zuul-jobs master: upload-pypi: support API token upload
opendevreviewMerged zuul/zuul-jobs master: ensure-twine: make python3 default, ensure pip installed
opendevreviewMerged zuul/zuul-jobs master: upload-pypi: basic testing
opendevreviewMerged zuul/zuul-jobs master: upload-pypi: test sandbox upload
opendevreviewMerged openstack/project-config master: Remove testpypi references
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: rename with original timestamp
opendevreviewMerged openstack/project-config master: twine: default to python3 install
opendevreviewMerged zuul/zuul-jobs master: upload-pypi: always test upload
opendevreviewMerged openstack/project-config master: pypi: use API token for upload
ianw1-6 is done ... sure takes a while05:38
dpawlik1Clark[m]: hey, yes, we can06:26
dpawlik1welcome back Clark[m]06:26
fbo[m]Hi, I noticed that the AFS mirror sync is quite old for centos and epel is sync 2 and 3 days ago ( 08:23
*** ysandeep is now known as ysandeep|lunch08:38
dpawlik1cc ianw ^^09:14 is reporting "@ERROR: max connections (150) reached -- try again later"09:36
ianwrsync: failed to connect to ( Connection timed out (110) -- epel09:38
ianw... both upstreams borked ... 09:38
ianwnot much can be done, apart from switching them.  i don't really have time right now, but will check on them tomorrow09:41
*** ysandeep|lunch is now known as ysandeep10:23
*** rlandy|out is now known as rlandy10:30
*** sfinucan is now known as stephenfin11:30
*** dviroel_ is now known as dviroel11:35
*** Guest5324 is now known as rcastillo13:07
*** dasm|off is now known as dasm|ruck13:43
fbo[m]ianw: ok. thanks15:06
*** dviroel is now known as dviroel|lunch15:08
*** ysandeep is now known as ysandeep|out15:17
*** marios is now known as marios|out15:48
*** dviroel_ is now known as dviroel16:15
opendevreviewMerged openstack/project-config master: update Review-Priority label for nova related projects
corvusi got an email from pypi... one of the projects i maintain has been designated a critical project (which means they're going to require 2fa on all the maintainer accounts).  i never would have guessed which project.16:31
corvus(and openstackci is a maintainer of this project too)16:32
fungiyeah, openstackci is a maintainer of 30 such projects at my last count16:34
corvusthe overlap with my personal account: requestsexceptions16:34
fungiianw has already added 2fa for openstackci with a totp setup like we used for group github auth16:34
corvusi wonder when that became critical to the python ecosystem16:35
fungihowever, when you do that you'll get a new notification the next time you release, which says eventually uploading via un/pw auth will be disabled for 2fa-using accounts and you should use an api token instead16:35
fungiso ianw has also added token support and testing for the relevant roles in zuul-jobs16:36
fungithe 2fa thing is a (in my opinion slightly knee-jerk) reaction to some pypi accounts getting compromised and used to upload maliciously modified new releases of projects, so i expect they're trying to contain the blast radius for future such incidents16:37
fungicompromised via account takeover i mean, e.g. password guessing or reused/leaked passwords16:38
fungicorvus: the silver lining, i guess, is that you can put in a request for a couple of free google auth dongles if your account is a maintainer of such a project16:40
corvus"Any project in the top 1% of downloads over the prior 6 months is designated as critical. "16:48
corvusmaybe we ought to cut a new release of requestsexceptions.  it's been 4 years.17:00
corvus(though i doubt anything has changed.  i guess we could add type hints to the one method in there.17:01
fungithat's nothing. i've been in a fairly active discussion in the python community about the lockfile library, because openstackci is also a maintainer of that. it's had no new releases since 2015 when openstack finally finished moving off of it to oslo.concurrency17:11
fungiit's been effectively abandoned since ~2010, save for a brief period around 2014-2015 when openstack adopted it in order to add py3k support as a stop-gap until their projects could stop depending on it17:12
fungibut the project page says quite prominently not to use that lib and lists some better alternatives, and that apparently hasn't stopped it from being in the top 1% of downloads17:13
fungia bit of searching on my part found that python-daemon still requires lockfile, and while openstack doesn't use python-daemon directly any more either it's required by ansible-runner which a few tripleo projects rely on, so at some point in 2018 lockfile quietly crept back into openstack's transitive dependency set/constraints list without anybody noticing17:15
fungiboth python-daemon and ansible-runner have bugs opened against them for a while suggesting moving off lockfile, but with no real progress17:19
*** rlandy_ is now known as rlandy17:59
*** KendallNelson[m] is now known as diablo_rojo18:45
*** diablo_rojo is now known as Guest539818:46
*** Guest5398 is now known as diablo_rojo18:48
*** Guest5179 is now known as clarkb19:03
clarkbI'm me again19:05
opendevreviewJames Page proposed openstack/project-config master: Add OpenStack K8S charms
clarkbjrosser: any chance OSA was using zuul + ansible v5 before we switched the default over? AlbinVass[m] found a deadlock that was causing post failures in our infra jobs and we think updating glibc will correct that and wondering if that was the source of your problems too19:09
fungiooh, i didn't even consider those might be related19:11
fungiand yeah, i guess setting the ansible_version for the job would percolate down to the post-run log upload tasks from base as well?19:12
*** rlandy_ is now known as rlandy19:12
clarkband it occured tome that maybe the openstack ansible team was an early ansible 5 adopter19:12
clarkbinfra-root: I'm working on putting together tomorrow's meeting agenda now while I've got internets and am not driving a car (hopeflly no more driving long distances after today for a while)19:13
clarkbplease add anything I've missed and I'll try to send that soon19:13
jrosserclarkb: we take ansible-core + collections19:16
clarkbjrosser: for your nested ansible you do ^ but do you know if you ansible jobs overrode the dfeault zuul ansible version to v5 previously (we default to v5 on opendev now but didn't when you first started having this problem)19:17
fungijrosser: just to be clear, version of ansible being run by the zuul executor for the job tasks, not the nested19:17
jrosseroh no - we don’t ever touch that19:17
clarkbok maybe not the same problem then19:18
BlaisePabon[m]1Please skip this thread if you're not a developer infrastructure.19:29
BlaisePabon[m]1It seems that as an infrastructure dev it is hard to show off my work, so I was wondering if there are any sample projects designed to highlight infrastructure.19:30
BlaisePabon[m]1Sort the way the [swagger petstore shows Open API](
BlaisePabon[m]1or the way the [gothinkster realworld app shows off frameworks](
BlaisePabon[m]1Do any of us know of a resource like this?19:33
BlaisePabon[m]1If not... I suppose I will go ahead with
clarkbI'm not aware of a deployment sandbox for developer tools (code review, ci etc)19:34
fungiBlaisePabon[m]1: maybe it's a cheat answer, but we show off our infrastructure by running it as a public service, with public-facing monitoring/trending, publicly accessible apis, and managing it all through public code-reviewed git repositories19:34
fungi(open source infrastructure, essentially)19:35
BlaisePabon[m]1Good answer.19:35
BlaisePabon[m]1I agree 100%19:35
fungias for how people with non-public infrastructure show theirs off, yeah i don't have the first clue19:36
BlaisePabon[m]1What I'm trying to do is just a tad different:19:36
BlaisePabon[m]1provide a platform independent guide for building infrastructure.19:36
BlaisePabon[m]1In fact, I might "cheat" and use OpenDev as a reference implementation.19:37
BlaisePabon[m]1With a sequence of annotated tests that will produce a report  enumerating the components.19:37
fungibut yeah, having running public instances of things like gerrit and zuul and the other services we run makes it very easy for people to quickly see them in action19:38
BlaisePabon[m]1Yes, I suffere in Enterprise computing for years and have no way of showing off my work.19:38
BlaisePabon[m]1Now I'm looking for a job and I figure it would be good to make a contribution that is also (uncharacteristically) self-serving.19:39
BlaisePabon[m]1(btw, did I mention that I'm looking for a job?)19:39
clarkbfungi: any idea if the glibc update rolled out fully yet? might be the same issue and unsure if that would've run against a newer glibc19:43
fungiianw was restarting the executors19:43
clarkbya it kinda looks like they all restarted by now (or the playbook stopped)19:44
fungilooks like 07-12 are on a slightly newer revision than 01-06:
funginewer revision of zuul container i mean19:45
fungize01-06 are running on the same container version as the schedulers and mergers19:45
*** tosky_ is now known as tosky19:45
clarkboh then maybe they haven't restarted19:45
clarkband this ran on 06 which is in the not restarted list?19:46
corvusi thought ianw said 1-6 are done19:47
corvusthough i agree 7-12 are newer19:48
corvuslet's double check the metadata of the images to see19:48
clarkbcorvus: yes I see in scrollback that 1-6 were reported to be done19:48
clarkbbut it is odd that the git sha matches on 1-6 and the schedulers19:48
corvus01 is running change 83122219:49
corvus12 is running change 84979519:49
corvusso yeah, only 7-12 are running the glibc upgrade19:50
corvus(this is via `docker image inspect` on the images those containers are running -- recall that we put change metadata in the docker images so we know what change's gate job built the image)19:51
BlaisePabon[m]1* a developer of infrastructure, * infrastructure software (ie., * . devops/SRE/infosec)19:57
fungicorvus: clarkb: guessing here, maybe the restart of 01-06 was too early and the images hadn't updated on dockerhub (or pulling them got missed), but then 07-12 happened late enough they got the update?20:11
opendevreviewJames Page proposed openstack/project-config master: HTTP check existing repositories
opendevreviewJames Page proposed openstack/project-config master: Add OpenStack K8S charms
opendevreviewJames Page proposed openstack/project-config master: HTTP check existing repositories
clarkbfungi: ya that could be. fwiw I won't try to restart 1-6 now as I'm on the road to catch up with foundation staff that ended up nearby. I can probably do it tomorrow if not done by then20:45
opendevreviewJames Page proposed openstack/project-config master: Add OpenStack K8S charms
fungii can restart them tonight my time when usage is lower20:48
*** dasm|ruck is now known as dasm|off21:27
*** dviroel is now known as dviroel|out21:39
clarkblast call on meeting agenda stuff. Going to send that out nowish to make sure it gets out21:42
clarkbok sent21:48
fungii've got nothing21:48
*** rlandy is now known as rlandy|bbl21:56
ianwcorvus/clarkb: interesting that only half worked ... the screen i restarted things in with the playbook is still open on bridge22:08
ianwi checked and saw the updated release on dockerhub; but also the "remove containers" step took quite a while to start after 1-6 stopped ... 22:11
ianwi've messed up the release job it would seem22:12
fungiianw: does that playbook perform a pull before restarting, or did you do the pull yourself first?22:12
ianw -- i guess this runs on the executor so no sudo.  will fix post haste22:12
fungii'm around to review. do we need to reenqueue any failed releases?22:13
ianwfungi: it has TASK [zuul-executor : Remove Zuul Executor containers] which i am assuming clears them so the next start pulls them22:13
ianwfungi: so far that one i linked i guess22:13
fungiyeah, build history just shows one post_failure22:14
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-twine : remove ensure-pip
ianw was the one i checked on, which released "21 hours ago"22:21
ianwit was 11:10am local time here, and i started the rolling playbook at ~11:25am22:23
opendevreviewMerged zuul/zuul-jobs master: ensure-twine : remove ensure-pip
ianwlooks like the mirror has fixed epel23:14 is still seeming to reject us with "too many connections"23:15
fungithat tends to happen for a few days/weeks after some major distro they're mirroring makes a significant new release23:19
fungiianw: as for the executor upgrades, does removing the container remove the image? i thought it merely required the container to be recreated from a locally cached image23:19
Clark[m]You need to pull it. That may have been the issue the others restarted after the hourly pulls happened 23:20
fungithat's what i was suspecting happened, and would certainly explain what we observed23:21
fungiokay, i've reenqueued the failed release with: sudo zuul-client enqueue-ref --tenant=openstack --pipeline=release --project=openstack/manila-tempest-plugin --ref=refs/tags/1.9.023:21
ianwfungi: oh, thanks, was just typing that :)23:22
fungiunfortunately, release-openstack-python seems to be on its second retry already23:22
ianwok, so the restart playbook doesn't pull new images then i guess.  probably by design?  perhaps we should add an argument or something23:23
fungimmm, that enqueue-ref seems to maybe have misinterpreted the --ref i passed23:24
fungiit's showing a 0 ref in the status page23:24
ianwi doesn't look happy23:24
ianwyeah, that :)23:24
fungii wonder if there's a regression in zuul-client around enqueuing into ref-triggered pipelines23:25
fungithis is the same syntax i've used with it in the past23:25
fungithough we're running 0.0.5.dev19 not 0.1.023:26
opendevreviewIan Wienand proposed opendev/system-config master: centos 7 mirror: switch upstream provider
ianwi'm not sure, i always get lost with enqueue to periodic or post pipelines23:32
fungii also tried with --ref=1.9.0 but that didn't work either23:34
fungilooks like it may be back to wanting --oldrev and --newrev too23:34
fungior at least --newrev23:35
fungioh, though the rerun with just the tag name does seem to be getting farther  even if the status page shows an odd ref for it23:40
ianwlooks like it uploaded23:41
ianwthanks for that23:43
fungiso looks like it did what we wanted, yeah23:50
fungibut the build result page is a bit weird...23:50
fungiRevision 000000023:50
fungiBranch 1.9.023:51
fungiother release build results show a ref like refs/tags/1.9.0 and no branch23:51
ianwso did we end up restarting all the executors?23:56

Generated by 2.17.3 by Marius Gedminas - find it at!