ianw | i'm running the rolling restart playbook limited to the executors in a root screen | 01:25 |
---|---|---|
ianw | #status log pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org | 02:15 |
opendevstatus | ianw: finished logging | 02:15 |
opendevreview | Merged zuul/zuul-jobs master: upload-pypi: support API token upload https://review.opendev.org/c/zuul/zuul-jobs/+/849589 | 02:33 |
opendevreview | Merged zuul/zuul-jobs master: ensure-twine: make python3 default, ensure pip installed https://review.opendev.org/c/zuul/zuul-jobs/+/849598 | 02:33 |
opendevreview | Merged zuul/zuul-jobs master: upload-pypi: basic testing https://review.opendev.org/c/zuul/zuul-jobs/+/849593 | 02:37 |
opendevreview | Merged zuul/zuul-jobs master: upload-pypi: test sandbox upload https://review.opendev.org/c/zuul/zuul-jobs/+/849597 | 02:42 |
opendevreview | Merged openstack/project-config master: Remove testpypi references https://review.opendev.org/c/openstack/project-config/+/849757 | 02:47 |
opendevreview | Ian Wienand proposed opendev/system-config master: run-production-playbook: rename with original timestamp https://review.opendev.org/c/opendev/system-config/+/850123 | 02:49 |
opendevreview | Merged openstack/project-config master: twine: default to python3 install https://review.opendev.org/c/openstack/project-config/+/849758 | 02:52 |
opendevreview | Merged zuul/zuul-jobs master: upload-pypi: always test upload https://review.opendev.org/c/zuul/zuul-jobs/+/849903 | 02:53 |
opendevreview | Merged openstack/project-config master: pypi: use API token for upload https://review.opendev.org/c/openstack/project-config/+/849763 | 02:57 |
ianw | 1-6 is done ... sure takes a while | 05:38 |
dpawlik1 | Clark[m]: hey, yes, we can | 06:26 |
dpawlik1 | welcome back Clark[m] | 06:26 |
fbo[m] | Hi, I noticed that the AFS mirror sync is quite old for centos and epel is sync 2 and 3 days ago (https://grafana.opendev.org/d/9871b26303/afs?orgId=1) | 08:23 |
*** ysandeep is now known as ysandeep|lunch | 08:38 | |
dpawlik1 | cc ianw ^^ | 09:14 |
ianw | kernel.org is reporting "@ERROR: max connections (150) reached -- try again later" | 09:36 |
ianw | rsync: failed to connect to pubmirror1.math.uh.edu (129.7.128.189): Connection timed out (110) -- epel | 09:38 |
ianw | ... both upstreams borked ... | 09:38 |
ianw | not much can be done, apart from switching them. i don't really have time right now, but will check on them tomorrow | 09:41 |
*** ysandeep|lunch is now known as ysandeep | 10:23 | |
*** rlandy|out is now known as rlandy | 10:30 | |
*** sfinucan is now known as stephenfin | 11:30 | |
*** dviroel_ is now known as dviroel | 11:35 | |
*** Guest5324 is now known as rcastillo | 13:07 | |
*** dasm|off is now known as dasm|ruck | 13:43 | |
fbo[m] | ianw: ok. thanks | 15:06 |
*** dviroel is now known as dviroel|lunch | 15:08 | |
*** ysandeep is now known as ysandeep|out | 15:17 | |
*** marios is now known as marios|out | 15:48 | |
*** dviroel_ is now known as dviroel | 16:15 | |
opendevreview | Merged openstack/project-config master: update Review-Priority label for nova related projects https://review.opendev.org/c/openstack/project-config/+/837595 | 16:21 |
corvus | i got an email from pypi... one of the projects i maintain has been designated a critical project (which means they're going to require 2fa on all the maintainer accounts). i never would have guessed which project. | 16:31 |
corvus | (and openstackci is a maintainer of this project too) | 16:32 |
fungi | yeah, openstackci is a maintainer of 30 such projects at my last count | 16:34 |
corvus | the overlap with my personal account: requestsexceptions | 16:34 |
fungi | ianw has already added 2fa for openstackci with a totp setup like we used for group github auth | 16:34 |
corvus | i wonder when that became critical to the python ecosystem | 16:35 |
fungi | however, when you do that you'll get a new notification the next time you release, which says eventually uploading via un/pw auth will be disabled for 2fa-using accounts and you should use an api token instead | 16:35 |
fungi | so ianw has also added token support and testing for the relevant roles in zuul-jobs | 16:36 |
fungi | the 2fa thing is a (in my opinion slightly knee-jerk) reaction to some pypi accounts getting compromised and used to upload maliciously modified new releases of projects, so i expect they're trying to contain the blast radius for future such incidents | 16:37 |
fungi | compromised via account takeover i mean, e.g. password guessing or reused/leaked passwords | 16:38 |
fungi | corvus: the silver lining, i guess, is that you can put in a request for a couple of free google auth dongles if your account is a maintainer of such a project | 16:40 |
corvus | "Any project in the top 1% of downloads over the prior 6 months is designated as critical. " | 16:48 |
corvus | maybe we ought to cut a new release of requestsexceptions. it's been 4 years. | 17:00 |
corvus | (though i doubt anything has changed. i guess we could add type hints to the one method in there. | 17:01 |
fungi | that's nothing. i've been in a fairly active discussion in the python community about the lockfile library, because openstackci is also a maintainer of that. it's had no new releases since 2015 when openstack finally finished moving off of it to oslo.concurrency | 17:11 |
fungi | it's been effectively abandoned since ~2010, save for a brief period around 2014-2015 when openstack adopted it in order to add py3k support as a stop-gap until their projects could stop depending on it | 17:12 |
fungi | but the project page says quite prominently not to use that lib and lists some better alternatives, and that apparently hasn't stopped it from being in the top 1% of downloads | 17:13 |
fungi | a bit of searching on my part found that python-daemon still requires lockfile, and while openstack doesn't use python-daemon directly any more either it's required by ansible-runner which a few tripleo projects rely on, so at some point in 2018 lockfile quietly crept back into openstack's transitive dependency set/constraints list without anybody noticing | 17:15 |
fungi | both python-daemon and ansible-runner have bugs opened against them for a while suggesting moving off lockfile, but with no real progress | 17:19 |
*** rlandy_ is now known as rlandy | 17:59 | |
*** KendallNelson[m] is now known as diablo_rojo | 18:45 | |
*** diablo_rojo is now known as Guest5398 | 18:46 | |
*** Guest5398 is now known as diablo_rojo | 18:48 | |
*** Guest5179 is now known as clarkb | 19:03 | |
clarkb | I'm me again | 19:05 |
opendevreview | James Page proposed openstack/project-config master: Add OpenStack K8S charms https://review.opendev.org/c/openstack/project-config/+/849996 | 19:08 |
clarkb | jrosser: any chance OSA was using zuul + ansible v5 before we switched the default over? AlbinVass[m] found a deadlock that was causing post failures in our infra jobs and we think updating glibc will correct that and wondering if that was the source of your problems too | 19:09 |
fungi | ooh, i didn't even consider those might be related | 19:11 |
fungi | and yeah, i guess setting the ansible_version for the job would percolate down to the post-run log upload tasks from base as well? | 19:12 |
clarkb | yes | 19:12 |
*** rlandy_ is now known as rlandy | 19:12 | |
clarkb | and it occured tome that maybe the openstack ansible team was an early ansible 5 adopter | 19:12 |
clarkb | infra-root: I'm working on putting together tomorrow's meeting agenda now while I've got internets and am not driving a car (hopeflly no more driving long distances after today for a while) | 19:13 |
clarkb | please add anything I've missed and I'll try to send that soon | 19:13 |
jrosser | clarkb: we take ansible-core + collections | 19:16 |
clarkb | jrosser: for your nested ansible you do ^ but do you know if you ansible jobs overrode the dfeault zuul ansible version to v5 previously (we default to v5 on opendev now but didn't when you first started having this problem) | 19:17 |
fungi | jrosser: just to be clear, version of ansible being run by the zuul executor for the job tasks, not the nested | 19:17 |
jrosser | oh no - we don’t ever touch that | 19:17 |
clarkb | ok maybe not the same problem then | 19:18 |
BlaisePabon[m]1 | Please skip this thread if you're not a developer infrastructure. | 19:29 |
BlaisePabon[m]1 | It seems that as an infrastructure dev it is hard to show off my work, so I was wondering if there are any sample projects designed to highlight infrastructure. | 19:30 |
BlaisePabon[m]1 | Sort the way the [swagger petstore shows Open API](https://petstore3.swagger.io/). | 19:31 |
BlaisePabon[m]1 | or the way the [gothinkster realworld app shows off frameworks](https://github.com/gothinkster/realworld). | 19:32 |
BlaisePabon[m]1 | Do any of us know of a resource like this? | 19:33 |
BlaisePabon[m]1 | If not... I suppose I will go ahead with https://vitrina.readthedocs.io/en/latest/ | 19:34 |
clarkb | I'm not aware of a deployment sandbox for developer tools (code review, ci etc) | 19:34 |
fungi | BlaisePabon[m]1: maybe it's a cheat answer, but we show off our infrastructure by running it as a public service, with public-facing monitoring/trending, publicly accessible apis, and managing it all through public code-reviewed git repositories | 19:34 |
fungi | (open source infrastructure, essentially) | 19:35 |
BlaisePabon[m]1 | Good answer. | 19:35 |
BlaisePabon[m]1 | I agree 100% | 19:35 |
fungi | as for how people with non-public infrastructure show theirs off, yeah i don't have the first clue | 19:36 |
BlaisePabon[m]1 | What I'm trying to do is just a tad different: | 19:36 |
BlaisePabon[m]1 | provide a platform independent guide for building infrastructure. | 19:36 |
fungi | oh | 19:36 |
BlaisePabon[m]1 | In fact, I might "cheat" and use OpenDev as a reference implementation. | 19:37 |
BlaisePabon[m]1 | With a sequence of annotated tests that will produce a report enumerating the components. | 19:37 |
fungi | but yeah, having running public instances of things like gerrit and zuul and the other services we run makes it very easy for people to quickly see them in action | 19:38 |
BlaisePabon[m]1 | Yes, I suffere in Enterprise computing for years and have no way of showing off my work. | 19:38 |
BlaisePabon[m]1 | Now I'm looking for a job and I figure it would be good to make a contribution that is also (uncharacteristically) self-serving. | 19:39 |
BlaisePabon[m]1 | (btw, did I mention that I'm looking for a job?) | 19:39 |
clarkb | fungi: any idea if the glibc update rolled out fully yet? https://zuul.opendev.org/t/openstack/build/25b69ab536844186b654a824379cebd9/log/job-output.txt#467 might be the same issue and unsure if that would've run against a newer glibc | 19:43 |
fungi | ianw was restarting the executors | 19:43 |
clarkb | ya it kinda looks like they all restarted by now (or the playbook stopped) | 19:44 |
fungi | looks like 07-12 are on a slightly newer revision than 01-06: https://zuul.opendev.org/components | 19:44 |
fungi | newer revision of zuul container i mean | 19:45 |
fungi | ze01-06 are running on the same container version as the schedulers and mergers | 19:45 |
*** tosky_ is now known as tosky | 19:45 | |
clarkb | oh then maybe they haven't restarted | 19:45 |
clarkb | and this ran on 06 which is in the not restarted list? | 19:46 |
corvus | i thought ianw said 1-6 are done | 19:47 |
corvus | though i agree 7-12 are newer | 19:48 |
corvus | let's double check the metadata of the images to see | 19:48 |
clarkb | corvus: yes I see in scrollback that 1-6 were reported to be done | 19:48 |
clarkb | but it is odd that the git sha matches on 1-6 and the schedulers | 19:48 |
corvus | 01 is running change 831222 | 19:49 |
corvus | 12 is running change 849795 | 19:49 |
corvus | so yeah, only 7-12 are running the glibc upgrade | 19:50 |
corvus | (this is via `docker image inspect` on the images those containers are running -- recall that we put change metadata in the docker images so we know what change's gate job built the image) | 19:51 |
BlaisePabon[m]1 | * a developer of infrastructure, * infrastructure software (ie., * . devops/SRE/infosec) | 19:57 |
fungi | corvus: clarkb: guessing here, maybe the restart of 01-06 was too early and the images hadn't updated on dockerhub (or pulling them got missed), but then 07-12 happened late enough they got the update? | 20:11 |
opendevreview | James Page proposed openstack/project-config master: HTTP check existing repositories https://review.opendev.org/c/openstack/project-config/+/850252 | 20:38 |
opendevreview | James Page proposed openstack/project-config master: Add OpenStack K8S charms https://review.opendev.org/c/openstack/project-config/+/849996 | 20:38 |
opendevreview | James Page proposed openstack/project-config master: HTTP check existing repositories https://review.opendev.org/c/openstack/project-config/+/850252 | 20:45 |
clarkb | fungi: ya that could be. fwiw I won't try to restart 1-6 now as I'm on the road to catch up with foundation staff that ended up nearby. I can probably do it tomorrow if not done by then | 20:45 |
opendevreview | James Page proposed openstack/project-config master: Add OpenStack K8S charms https://review.opendev.org/c/openstack/project-config/+/849996 | 20:46 |
fungi | i can restart them tonight my time when usage is lower | 20:48 |
*** dasm|ruck is now known as dasm|off | 21:27 | |
*** dviroel is now known as dviroel|out | 21:39 | |
clarkb | last call on meeting agenda stuff. Going to send that out nowish to make sure it gets out | 21:42 |
clarkb | ok sent | 21:48 |
fungi | i've got nothing | 21:48 |
*** rlandy is now known as rlandy|bbl | 21:56 | |
ianw | corvus/clarkb: interesting that only half worked ... the screen i restarted things in with the playbook is still open on bridge | 22:08 |
ianw | i checked and saw the updated release on dockerhub; but also the "remove containers" step took quite a while to start after 1-6 stopped ... | 22:11 |
ianw | i've messed up the release job it would seem | 22:12 |
fungi | ianw: does that playbook perform a pull before restarting, or did you do the pull yourself first? | 22:12 |
ianw | https://zuul.opendev.org/t/openstack/build/3c975369bdf44ae8a495cbcf0884a27e/console -- i guess this runs on the executor so no sudo. will fix post haste | 22:12 |
fungi | i'm around to review. do we need to reenqueue any failed releases? | 22:13 |
ianw | fungi: it has TASK [zuul-executor : Remove Zuul Executor containers] which i am assuming clears them so the next start pulls them | 22:13 |
ianw | fungi: so far that one i linked i guess | 22:13 |
fungi | yeah, build history just shows one post_failure | 22:14 |
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: ensure-twine : remove ensure-pip https://review.opendev.org/c/zuul/zuul-jobs/+/850257 | 22:16 |
ianw | https://hub.docker.com/layers/zuul-executor/zuul/zuul-executor/latest/images/sha256-274776228b994497c45f0c6c96cd62564345de09c6ed683ba654ef3640d35229?context=explore was the one i checked on, which released "21 hours ago" | 22:21 |
ianw | it was 11:10am local time here, and i started the rolling playbook at ~11:25am | 22:23 |
opendevreview | Merged zuul/zuul-jobs master: ensure-twine : remove ensure-pip https://review.opendev.org/c/zuul/zuul-jobs/+/850257 | 22:58 |
ianw | looks like the uh.edu mirror has fixed epel | 23:14 |
ianw | kernel.org is still seeming to reject us with "too many connections" | 23:15 |
fungi | that tends to happen for a few days/weeks after some major distro they're mirroring makes a significant new release | 23:19 |
fungi | ianw: as for the executor upgrades, does removing the container remove the image? i thought it merely required the container to be recreated from a locally cached image | 23:19 |
Clark[m] | You need to pull it. That may have been the issue the others restarted after the hourly pulls happened | 23:20 |
fungi | that's what i was suspecting happened, and would certainly explain what we observed | 23:21 |
fungi | okay, i've reenqueued the failed release with: sudo zuul-client enqueue-ref --tenant=openstack --pipeline=release --project=openstack/manila-tempest-plugin --ref=refs/tags/1.9.0 | 23:21 |
ianw | fungi: oh, thanks, was just typing that :) | 23:22 |
fungi | unfortunately, release-openstack-python seems to be on its second retry already | 23:22 |
ianw | ok, so the restart playbook doesn't pull new images then i guess. probably by design? perhaps we should add an argument or something | 23:23 |
fungi | mmm, that enqueue-ref seems to maybe have misinterpreted the --ref i passed | 23:24 |
fungi | it's showing a 0 ref in the status page | 23:24 |
ianw | i doesn't look happy | 23:24 |
ianw | yeah, that :) | 23:24 |
fungi | i wonder if there's a regression in zuul-client around enqueuing into ref-triggered pipelines | 23:25 |
fungi | this is the same syntax i've used with it in the past | 23:25 |
fungi | though we're running 0.0.5.dev19 not 0.1.0 | 23:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: centos 7 mirror: switch upstream provider https://review.opendev.org/c/opendev/system-config/+/850260 | 23:31 |
ianw | i'm not sure, i always get lost with enqueue to periodic or post pipelines | 23:32 |
fungi | i also tried with --ref=1.9.0 but that didn't work either | 23:34 |
fungi | looks like it may be back to wanting --oldrev and --newrev too | 23:34 |
fungi | or at least --newrev | 23:35 |
fungi | oh, though the rerun with just the tag name does seem to be getting farther even if the status page shows an odd ref for it | 23:40 |
ianw | looks like it uploaded | 23:41 |
ianw | thanks for that | 23:43 |
fungi | https://zuul.opendev.org/t/openstack/build/2e30e6c0b7a24928ac7726c29fc32c6c/console#4/0/11/localhost | 23:49 |
fungi | https://pypi.org/project/manila-tempest-plugin/ | 23:50 |
fungi | so looks like it did what we wanted, yeah | 23:50 |
fungi | but the build result page is a bit weird... | 23:50 |
fungi | Revision 0000000 | 23:50 |
fungi | Branch 1.9.0 | 23:51 |
fungi | other release build results show a ref like refs/tags/1.9.0 and no branch | 23:51 |
ianw | so did we end up restarting all the executors? | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!