corvus | i've gone from zero experience with synapse db upgrades/issues to "a tiny bit". so far, i'd say it's reasonable and not too difficult. i did encounter a bug the first time i ran the migration script, but it had already been fixed in the latest software (thus the upgrade). so, stuff will happen. i think this just reaffirms my view -- it's well within our capability to run, but even better if someone else does. :) | 00:02 |
---|---|---|
opendevreview | Ghanshyam proposed openstack/project-config master: Update retiring uc-recognition repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796971 | 00:32 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update retiring ops-tags-team repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796972 | 00:35 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update retiring workload-ref-archs repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796973 | 00:39 |
opendevreview | Ghanshyam proposed openstack/project-config master: End project gating for retiring arch-wg repo https://review.opendev.org/c/openstack/project-config/+/796962 | 00:42 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update retiring enterprise-wg repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796974 | 00:46 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update project gating for retiring project-navigator-data repo https://review.opendev.org/c/openstack/project-config/+/796975 | 00:51 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update project gating for retiring governance-uc repo https://review.opendev.org/c/openstack/project-config/+/796976 | 00:57 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update project gating for retiring workload-ref-archs repo https://review.opendev.org/c/openstack/project-config/+/796978 | 01:05 |
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: Switch jobs to use fedora-34 nodes https://review.opendev.org/c/zuul/zuul-jobs/+/795636 | 01:10 |
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: Ensure dnf-plugins-core before calling "dnf copr" https://review.opendev.org/c/zuul/zuul-jobs/+/796979 | 01:10 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update project gating for retiring openstack-specs repo https://review.opendev.org/c/openstack/project-config/+/796980 | 01:13 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: fedora-container: install dnf-plugins-core https://review.opendev.org/c/openstack/diskimage-builder/+/796984 | 02:09 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: fedora-container: install dnf-plugins-core https://review.opendev.org/c/openstack/diskimage-builder/+/796984 | 02:10 |
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: Switch jobs to use fedora-34 nodes https://review.opendev.org/c/zuul/zuul-jobs/+/795636 | 02:14 |
opendevreview | Merged zuul/zuul-jobs master: Ensure dnf-plugins-core before calling "dnf copr" https://review.opendev.org/c/zuul/zuul-jobs/+/796979 | 03:12 |
opendevreview | Ian Wienand proposed opendev/system-config master: review02 : bump heap limit to 96gb https://review.opendev.org/c/opendev/system-config/+/784003 | 03:20 |
opendevreview | Merged zuul/zuul-jobs master: Switch jobs to use fedora-34 nodes https://review.opendev.org/c/zuul/zuul-jobs/+/795636 | 03:30 |
opendevreview | Merged zuul/zuul-jobs master: ensure-zookeeper: better match return code https://review.opendev.org/c/zuul/zuul-jobs/+/793537 | 03:30 |
opendevreview | Merged opendev/system-config master: Add note about afs01's mirror-update vos releases to docs https://review.opendev.org/c/opendev/system-config/+/796893 | 03:39 |
opendevreview | Merged opendev/system-config master: review02 : bump heap limit to 96gb https://review.opendev.org/c/opendev/system-config/+/784003 | 03:53 |
diablo_rojo | ianw, around? | 04:02 |
ianw | diablo_rojo: yep! | 04:02 |
diablo_rojo | ianw, I was going to start looking at converting the puppet-ptgbot to use ansible if you've got any pointers or examples I should look at? | 04:03 |
diablo_rojo | I'm excited to take a stab at it :) | 04:03 |
ianw | cool, umm let me see ... | 04:04 |
diablo_rojo | Yeah no rush. I won't be up for a ton longer, I just wanted to message you before I forgot today. This week has gotten away from me. | 04:05 |
ianw | the first thing will be to create a container with ptgbot in it | 04:05 |
ianw | it looks like a pretty standard python app | 04:06 |
diablo_rojo | Yeah I think ttx tried to keep it pretty simple. | 04:06 |
diablo_rojo | How would I go about creating the container? (pardon my ignorance please :) ) | 04:06 |
ianw | haha ignorance not even vaguely considered :) | 04:07 |
ianw | https://opendev.org/opendev/statusbot/commit/6da21b94992661aa9596c746c7bcbf60cf9c2ac2 would be an example | 04:08 |
ianw | the only part of this we can't pre-test is that secret | 04:08 |
diablo_rojo | Okay so docker file and a yaml. | 04:09 |
diablo_rojo | How does that get generated? | 04:09 |
ianw | that Dockerfile would be basically correct, it would need a different command of course | 04:09 |
ianw | that command would be currently in puppet | 04:09 |
diablo_rojo | ianw, coolio, yeah seems easy enough. | 04:10 |
diablo_rojo | the command for generating the secret? | 04:10 |
ianw | sorry i mean the startup command for the daemon | 04:10 |
ianw | the secret an infra-root will need to generate for the ptgbot project from our docker key | 04:11 |
ianw | https://opendev.org/opendev/puppet-ptgbot/src/branch/master/files/ptgbot.init | 04:11 |
diablo_rojo | I think I am following so far :) | 04:11 |
ianw | yeah, so currently (or previously) it was installed and then started via ^^ that init script | 04:12 |
ianw | so instead the daemon will want to run in the container | 04:12 |
ianw | once we have the container, we should write a role in system-config to deploy it | 04:12 |
ianw | that would look a lot like https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/statusbot | 04:13 |
diablo_rojo | so would those be two separate patches? One to setup the container and another to actually write the ansible role? | 04:13 |
ianw | yes, i'd build and publish the container from the ptgbot project, and then system-config will consume it | 04:13 |
ianw | writing the role is where you'll want to more-or-less translate what's happening in puppet-ptgbot to ansible | 04:14 |
diablo_rojo | Okay so the changes will be in opendev/puppet-ptgbot and opendev/system-config? | 04:14 |
diablo_rojo | Err maybe not. | 04:14 |
ianw | puppet-ptgbot won't be used any more; it's esentially an exercise in converting that to ansible | 04:15 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/statusbot/tasks/main.yaml are the broad strokes of it | 04:15 |
diablo_rojo | Okay so the container stuff will live in openstack/ptgbot then? | 04:15 |
ianw | make config directories, deploy config files, that sort of thing | 04:15 |
diablo_rojo | Okay. I think I am still following :) | 04:16 |
ianw | and then in system-config there will be a docker-compose file to start the service, like https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/statusbot/files/docker-compose.yaml | 04:16 |
ianw | that's where you map in config files, log volumes, whatever, from the underlying host | 04:16 |
diablo_rojo | Okay. | 04:17 |
ianw | yeah; the theory should be that the ptgbot container is a generic thing that theoretically anybody could use | 04:17 |
ianw | (i mean realistically we'll be the only consumer i'd say, but it makes for nice separation of concerns) | 04:17 |
diablo_rojo | Right okay. That makes sense :) | 04:18 |
ianw | so once you have the ptgbot role, you'll want to add it to the eavesdrop playbook | 04:18 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/service-eavesdrop.yaml | 04:18 |
diablo_rojo | Okay easy enough :) | 04:19 |
ianw | at that point, you can test it in the gate | 04:19 |
diablo_rojo | Cool :) And hopefully doesn't explode. | 04:19 |
ianw | i can almost guarantee it will at first :) | 04:20 |
diablo_rojo | I will try to get a WIP posted this weekend/early next week for at least the container stuff and docker image. | 04:20 |
ianw | https://zuul.opendev.org/t/openstack/builds?job_name=system-config-run-eavesdrop is the job that will run | 04:21 |
diablo_rojo | Of course :) Just hopefully not *that* bad. | 04:21 |
diablo_rojo | Okay cool. | 04:22 |
diablo_rojo | I think thats all I need for now? Anything else I should be aware of? | 04:22 |
ianw | we can put the job on hold and you can live inspect and fiddle with it | 04:22 |
diablo_rojo | I know where to find you when I inevitably will have more questions :) | 04:22 |
diablo_rojo | Oh sweet. | 04:22 |
ianw | i can generate the secret for the ptgbot project and paste that for you to use when ready | 04:23 |
ianw | we shouldn't *have* to have that published to dockerhub for this to work | 04:23 |
diablo_rojo | Oh nice. That would be helpful :) I will give you a heads up when I am ready. | 04:24 |
ianw | we use an intermediate registry, so earlier jobs push their images to it, and later jobs download from it. so everything can hang together speculatively based on Depends-On | 04:24 |
ianw | the system-config-run-eavesdrop job may need some tweaking to require the ptgbot jobs, etc. for that to hang together | 04:25 |
diablo_rojo | Okay. Noted. | 04:26 |
ianw | you'll probably also want to setup things like letsencrypt certs; is it staying an openstack thing or should it be ptg.opendev.org? | 04:28 |
diablo_rojo | I'm guessing the latter? | 04:29 |
ianw | in that case you'll want a CNAME added to https://opendev.org/opendev/zone-opendev.org/src/branch/master/zones/opendev.org/zone.db to point ptgbot.opendev.org -> eavesdrop01.opendev.org (and an _acme-challenge record for letsencrypt) | 04:30 |
ianw | we should redirect ptgbot.openstack.org to that as well. unfortunately only an infra-root can manage openstack.org as it has to be done via RAX's interface | 04:31 |
ianw | i can however pre-add _acme-challenge.ptgbot.openstack.org now though, which will allow us to get a letsencrypt certificate covering it | 04:32 |
ianw | https://docs.opendev.org/opendev/system-config/latest/letsencrypt.html should be a pretty good overview of the letsencrypt process; lots of examples in the code now | 04:33 |
diablo_rojo | Okay just a few more steps :) | 04:34 |
ianw | yep, but you will want to have that setup in the initial change, because you'll need the certificates for setting up the webserver | 04:34 |
ianw | in testing, we just make self-signed certs | 04:35 |
diablo_rojo | Ohh okay. | 04:35 |
diablo_rojo | Thanks for all the info ianw :) I am going to head to bed. Enjoy the rest of your day! | 04:43 |
ianw | diablo_rojo: no worries, later! | 04:43 |
ianw | corvus / mordred: a question for when you're around : it's openstack/ptgbot -- in theory i guess we should publish a container for it under openstackorg (https://hub.docker.com/u/openstackorg) ... so far nothing is published there | 04:58 |
ianw | in practice it feels more like an opendev thing, i don't know | 04:59 |
ianw | diablo_rojo: i've generated the secrets section for both options @ http://paste.openstack.org/show/QR8BeDPufFWCNqvZVgRm/ ... you can copy-paste either of those depending on if we want to publish it under opendev or openstack | 05:00 |
*** ykarel|away is now known as ykarel | 05:23 | |
*** marios is now known as marios|ruck | 06:02 | |
*** jpena|off is now known as jpena | 07:18 | |
*** rpittau|afk is now known as rpittau | 08:17 | |
yoctozepto | infra-root: ethercalc is down | 08:19 |
*** ykarel is now known as ykarel|lunch | 09:00 | |
*** raukadah is now known as chandankumar | 09:26 | |
frickler | yoctozepto: what issue do you see exactly? seems to be working fine for me. though the instance has an uptime of only ~2d ... | 09:28 |
yoctozepto | frickler: it works now; it was not responding (timeout) | 09:37 |
*** ykarel|lunch is now known as ykarel | 10:13 | |
opendevreview | chandan kumar proposed openstack/project-config master: Enable publish-openstack-python-tarball job https://review.opendev.org/c/openstack/project-config/+/797049 | 10:22 |
*** jpena is now known as jpena|lunch | 11:41 | |
fungi | ianw: the foundation has an osf/ git namespace which they're probably going to want renamed to openinfra/ at some point | 11:48 |
fungi | oh, but you're talking about dockerhub not opendev git/gerrit | 11:49 |
*** bhagyashris_ is now known as bhagyashris | 11:50 | |
*** ysandeep is now known as ysandeep|brb | 11:58 | |
fungi | ianw: diablo_rojo_phone: the ensuing discussion around https://review.opendev.org/780947 indicated that we might want it to be ptg.openinfra.dev since the ptg is an event put on by the foundation | 12:02 |
fungi | but we can decide that closer to completion of the conversion | 12:03 |
*** whayutin is now known as weshay | 12:07 | |
rosmaita | hello! when someone has a few minutes, i still can't get my brick-cinderclient-dsvm-functional-py36 job working, even with the ubuntu-bionic nodeset -- there may be some required option I'm not setting. Error is here: https://zuul.opendev.org/t/openstack/build/fc3c9eed53d24157983e35cac1eb5ad9 | 12:34 |
*** jpena|lunch is now known as jpena | 12:38 | |
fungi | rosmaita: that might be better discussed in #openstack-qa since the error indicates you're not passing some expected values to the devstack playbook the openstack qa team maintains... to be honest i'm not that familiar with it | 12:41 |
rosmaita | fungi: thanks! | 12:41 |
*** ysandeep|brb is now known as ysandeep | 12:54 | |
opendevreview | Merged openstack/project-config master: Update retiring uc-recognition repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796971 | 13:00 |
opendevreview | Merged openstack/project-config master: Update retiring ops-tags-team repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796972 | 13:03 |
opendevreview | Merged openstack/project-config master: End project gating for retiring arch-wg repo https://review.opendev.org/c/openstack/project-config/+/796962 | 13:07 |
opendevreview | Merged openstack/project-config master: Update retiring enterprise-wg repo ACL to openstack/retired.config https://review.opendev.org/c/openstack/project-config/+/796974 | 13:08 |
opendevreview | Merged openstack/project-config master: Update project gating for retiring project-navigator-data repo https://review.opendev.org/c/openstack/project-config/+/796975 | 13:09 |
opendevreview | Merged openstack/project-config master: Update project gating for retiring governance-uc repo https://review.opendev.org/c/openstack/project-config/+/796976 | 13:09 |
opendevreview | Merged openstack/project-config master: Update project gating for retiring workload-ref-archs repo https://review.opendev.org/c/openstack/project-config/+/796978 | 13:09 |
opendevreview | Merged openstack/project-config master: Update project gating for retiring openstack-specs repo https://review.opendev.org/c/openstack/project-config/+/796980 | 13:09 |
*** raukadah is now known as chandankumar | 13:13 | |
*** ysandeep is now known as ysandeep|away | 13:37 | |
*** rpittau is now known as rpittau|afk | 14:09 | |
*** marios is now known as marios|ruck | 14:34 | |
mordred | linux australia has made a public matrix "space" (which is still a beta feature, but they're trialing it) https://matrix.to/#/#linux-australia:matrix.org ... in case looking at how such a thing works is a useful to folks. In element you need to enable the experimental spaces feature (settings -> labs) | 14:40 |
*** jpena is now known as jpena|out | 14:48 | |
clarkb | mordred: is there a tldr on what the space is? like a super channel? | 15:20 |
mordred | It's a named collection of channels | 15:20 |
clarkb | infra-root also appears that LE is still failing to update on a number of servers I'll be looking into why that is still failing after fixing nb03's disk situation next | 15:21 |
mordred | Although in matrix itself it's implemented as a channel that contains channels | 15:21 |
mordred | But it's a way to curate and name related collections of things. It's also not exclusive, so a given channel can be in multiple spaces | 15:22 |
mordred | One could imagine an openstack space with ALL of the openstack channels plus #opendev. And a zuul space with zuul, opendev, ansible, gerrit, etc | 15:23 |
fungi | clarkb: one theory was that we're running up against le cert limits for static.o.o since we're getting individual certs for all the sites rather than a single cert with them all as altnames | 15:23 |
mordred | So for a new user to a given sub-community they can just add the space and not have to hunt for various things they might want to be in | 15:24 |
fungi | mordred: so it's customized indices of channels, essentially? | 15:24 |
mordred | fungi: Ugh | 15:24 |
mordred | fungi: yes! | 15:24 |
clarkb | fungi: that would only affect those names though I think and not review. But could be they look at the aggregate too | 15:25 |
fungi | oh, review was also impacted? | 15:25 |
fungi | the expiring cert warnings i saw today were mostly stuff hosted on static, like the legacy git redirects | 15:26 |
fungi | also entirely possible we have more than one problem | 15:26 |
clarkb | fungi: yes review is in the list of emails too | 15:26 |
fungi | ahh, okay | 15:26 |
clarkb | infra-prod-letsencrypt (the job) has succeeded since I fixed nb03 | 15:28 |
clarkb | Gerrit reports [Fri Jun 18 06:47:44 UTC 2021] Using CA: https://acme.zerossl.com/v2/DV90 (note that isn't letsencrypt) and later it fails due to Can not resolve _eab_id. I have found https://github.com/acmesh-official/acme.sh/wiki/Change-default-CA-to-ZeroSSL which | 15:30 |
clarkb | s/which// | 15:30 |
clarkb | I wonder if we're consuming acme.sh from not releases and it is in a broken transitory state | 15:31 |
clarkb | But also that seems like a good way to make people mad | 15:31 |
clarkb | https://github.com/acmesh-official/acme.sh/blob/dev/acme.sh#L32-L33 yup I think this is our problem | 15:35 |
clarkb | https://github.com/acmesh-official/acme.sh/blob/dev/acme.sh#L3541-L3547 is the error we are hitting fwiw. But I think we can just change over the server value and we'll be ok? | 15:40 |
*** dviroel is now known as dviroel|busy | 15:46 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Be explicit about server used in acme.sh https://review.opendev.org/c/opendev/system-config/+/797136 | 15:51 |
clarkb | infra-root ^ I think that may fix things for us (or at least return us to using LE instead of zerossl then we'll find the next problem) | 15:51 |
fungi | i guess https://github.com/acmesh-official/acme.sh/issues/3556 and https://github.com/acmesh-official/acme.sh/issues/3557 are related | 15:51 |
clarkb | I think we'd probably be ok with zerossl after a quick look, but it isn't working and I want working more than I need to get into a whether zerossl or le is better :) | 15:53 |
clarkb | infra-root another option would be to use https://github.com/acmesh-official/acme.sh/tree/2.9.0 and pin to that. I worry that if LE makes changes to their provisioning process that will be problematic for us. I think it is better to continue to roll forward | 15:55 |
fungi | clarkb: probably we need to have an account at zerossl for it to function? | 15:56 |
clarkb | fungi: ya that is my guess, its doing the http post and not getting what it expects back (due to our lack of account setup?) | 15:57 |
clarkb | if we think zerossl is a better option for us we can figure that out as a next step | 15:57 |
clarkb | I've been happy with LE though and don't like changing defaults in this way | 15:57 |
corvus | maybe we can use, er... LE's python thing whatever it's called? | 15:59 |
corvus | certbot | 16:00 |
corvus | they have a container now, we can "docker run --rm" it and not worry about packaging/venv/etc | 16:00 |
clarkb | corvus: does it have coordnated dns mode? I think that was one reason we ended up on acme.sh as it allowed you to run off and set up dns yourself | 16:01 |
clarkb | then come back to it and say ok try to finish issuance | 16:01 |
corvus | clarkb: maybe this? https://certbot.eff.org/docs/using.html?highlight=dns#manual | 16:02 |
clarkb | ya it seems a bit more difficult to use their hooks since it wants to do everything in one process run | 16:02 |
*** marios|ruck is now known as marios|out | 16:02 | |
clarkb | and we rely on calling process of $tool to run off and set up dns. The inversion might be painful with ansible coordination but theoretically doable | 16:03 |
clarkb | but also acme.sh has worked well I'm not sure we need to completely reengineer this | 16:03 |
clarkb | we just need to be more explicit | 16:03 |
clarkb | ya with acme.sh the process seems to be we run the issue command with --dns and --yes-I-know-dns-manual-mode-enough-go-ahead-please this causes acme.sh to do as much of the acme with LE to get the dns confirmation code and return it. We then take that, update our dns servers then run acme.sh again with renew and --yes-I-know-dns-manual-mode-enough-go-ahead-please and that tells it to | 16:10 |
clarkb | pick up from the previous request | 16:10 |
clarkb | and that works out well for having ansible coordinate things across services | 16:10 |
fungi | and solves our catch-22 for needing to sometimes generate certs for names which don't yet resolve in dns or don't point to an actual running/configured server yet, and without needing something like dynamic dns update services | 16:15 |
corvus | yeah, just saying it looks like there may be an option using the 'manual' plugin | 16:18 |
*** jpena|out is now known as jpena | 16:22 | |
*** jpena is now known as jpena|off | 16:48 | |
clarkb | zuul has +1'd https://review.opendev.org/c/opendev/system-config/+/797136 though the bulk of the change isn't actually tested since it goes through the staging path | 16:57 |
clarkb | that said I think we can probably land it then manually run the LE playbook so we don't wait for the periodic run? | 16:57 |
fungi | yeah, i've +2'd, seems fine | 16:59 |
clarkb | fungi: I've paged in some of the gerrit stuff. We have 275 conflicts identified by gerrit. ~clarkb/gerrit_user_cleanups/notes/proposed-cleanups.20210416 lists 181 email address and accounts. The idea is we will "retire" all of those account numbers then after some time we will run system-config/tools/remove-user-external-ids.py on proposed-cleanups.20210416 as long as users don't | 17:21 |
clarkb | complain about the retirements | 17:21 |
clarkb | fungi: the reason there isn't clear annotation for why each of those has been listed in the file is I just sort of manually went through the 275 and used manual judgement to assess which are probably ok and the reasoning behind each one may be quite specific. That said its been long enough since I did that that going trhough the list again is probably worthwhile and when I do that I | 17:22 |
clarkb | can make notes for each one | 17:22 |
clarkb | fungi: ~clarkb/gerrit_user_cleanups/notes/audit-results.yaml.20210415 is the information being processed to construct proposed-cleanups.20210416 | 17:23 |
fungi | okay, so it was at least expected that there might be some duplicate ids in the list, as well as duplicate addresses | 17:23 |
clarkb | that yaml file contains a number of account state attributes that are using for making these decisions. | 17:23 |
clarkb | fungi: yup because an email address may be used by a number of accounts and need to be cleaned up across all of them and a single account may have multiple email addresses that conflict with other accounts :( | 17:24 |
fungi | yep, got it. i'll take a closer look at the input list as well, but i think i'm okay moving forward with those at this point | 17:24 |
clarkb | fungi: as far as double checking things goes I think you can look in proposed-cleanups.20210416 then cross check with audit-results.yaml.20210415 and ensure none of the account attributes in yaml indicate an account that may actually be used | 17:24 |
clarkb | spot checking is probably fine. The list is quite large | 17:25 |
fungi | yeah, that's what i was going to do for spot-checks | 17:25 |
fungi | exactly | 17:25 |
clarkb | we're also retiring first because we can undo retirement pretty easily. So start with retirement. wait a couple of weeks then do the more destructive cleanup next | 17:25 |
fungi | right | 17:25 |
clarkb | fungi: as an example spot check the first entry in the proposal is also the first entry in the yaml. That individual has two conflicting accounts, both created 5 years ago. I made a judgement call to keep the newer of the two accounts and retire and cleanup the older of the two | 17:27 |
clarkb | The reason for this is both accounts are active, neither has pushed or reviewed code, but the older account doesn't have a valid openid so we just go for it even though the older account has valid ssh keys | 17:28 |
clarkb | that individual, should they decide to show up again, will not be able to login to the older account but can log in to the newer account and add new ssh keys there | 17:28 |
clarkb | so we remove the older account | 17:28 |
fungi | yep | 17:29 |
clarkb | talking out loud about it helps build my own confidence in what I did too so thank you for listening :) | 17:30 |
clarkb | fungi: do we want to approve https://review.opendev.org/c/opendev/system-config/+/797136 nowish? ianw isn't around today due to timezones and is most familiar with the tools there, but I think I got it right. | 17:33 |
clarkb | fungi: we can wait for ianw to take a look in a couple of days and review it too or run it now and see what happens | 17:33 |
fungi | may as well give it a shot, approved just now | 17:37 |
clarkb | k | 17:37 |
opendevreview | Merged opendev/system-config master: Be explicit about server used in acme.sh https://review.opendev.org/c/opendev/system-config/+/797136 | 18:38 |
fungi | clarkb: ^ | 18:38 |
fungi | guess we wait for it to deploy | 18:38 |
clarkb | yup the deploy seems to eb running now | 18:42 |
johnsom | Hi OpenDev. Just an FYI, I got an odd pop-up loading a gerrit patch: http://paste.openstack.org/show/806780/ | 18:42 |
johnsom | Reload had no issue. | 18:43 |
clarkb | johnsom: the backend may have timed out that request and sent an incomplete response or similar | 18:43 |
fungi | johnsom: neat, it went through on reload? the server might be struggling and returning questionable responses to apache. checking cacti graphs now | 18:44 |
johnsom | Yeah, not a big deal, just thought I would mention it in case the backend rotation has an issue. | 18:44 |
clarkb | johnsom: fwiw there is no backend rotation in this case, just a single gerrit. We are making progress on transitioning it to a larger server which we hope will ease these problems (if this is in fact related to system load/memory use) | 18:45 |
fungi | well, there's no rotation in this case | 18:45 |
fungi | er, what clarkb said | 18:45 |
johnsom | Ha, well. Ok then. I guess it's easy to know which backend it is speaking of... lol | 18:45 |
fungi | though nothing on the cacti graphs are immediately jumping out as anomalous | 18:46 |
fungi | there might have been a connectivity issue with the db since it's not on the same system... will check the gerrit error log | 18:46 |
johnsom | In case it matters for some reason in the future, this was the patch URL: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/793969 | 18:47 |
fungi | johnsom: do you have an approximate timestamp? | 18:47 |
fungi | though thanks, the request may help me narrow it down | 18:47 |
johnsom | About enough time to cut/paste into the pastebin page before 11:42 | 18:47 |
fungi | unfortunately gerrit likes to throw tons of benign java backtraces into its error log so it's the typical needle in a haystack search | 18:48 |
johnsom | (Pacific time) | 18:48 |
opendevreview | Merged zuul/zuul-jobs master: Add role to enable FIPS on a node https://review.opendev.org/c/zuul/zuul-jobs/+/788778 | 18:50 |
clarkb | review's acme log just said it verified things and the cert is available | 18:51 |
clarkb | the playbook is still running though. I'll check certs as reported by servers when the playbooks is done | 18:52 |
fungi | Not After : Jul 16 05:43:27 2021 GMT | 18:52 |
fungi | on the current review.o.o https cert | 18:52 |
clarkb | fungi: ya I think handlers run last | 18:52 |
fungi | k | 18:52 |
clarkb | and we use a handler to reload apache so we need the playbook to complete to see it on the server side | 18:52 |
fungi | ahh, yep | 18:52 |
clarkb | the playbook is done now | 18:55 |
fungi | Not After : Sep 16 17:51:16 2021 GMT | 18:55 |
clarkb | boom | 18:56 |
* fungi cheers | 18:56 | |
clarkb | I'm getting a complete list of names to check out of the ansible log and will check them | 18:58 |
fungi | echo|openssl s_client -connect review.opendev.org:https 2>/dev/null|openssl x509 -text|grep -i after|cut -c25- | 18:59 |
fungi | if you want a fast check | 18:59 |
clarkb | ah yup I think I may need to do that because a number of the certs redirect to other sites and firefox won't show me the initial cert | 19:01 |
clarkb | the review cert lgtm as does the static.o.o cert and mirror.regioneone.linaro.opendev.org. I'll switch to s_client now to avoid redirects | 19:01 |
fungi | oh, that may not help if relying on sni | 19:02 |
fungi | since that's getting you the initial cert | 19:03 |
fungi | -servername name | 19:03 |
fungi | according to the s_client manpage | 19:03 |
fungi | that should get you the correct sni context | 19:03 |
* clarkb tries again | 19:04 | |
clarkb | though in this case I got different times for each one | 19:04 |
clarkb | implying they were different (maybe my s_client is new enough for sni by default?) | 19:04 |
clarkb | "If -servername is not provided, the TLS SNI extension will be populated with the name given to -connect if it follows a DNS name format." yup I should be good | 19:05 |
fungi | yeah, i just tested and the hostname i passed on -connect seems to bring up the correct cert | 19:05 |
fungi | righteous | 19:05 |
clarkb | this all looks good to me now. I'm happy we check these a month in advance :) | 19:05 |
fungi | i suppose if servername isn't in dns then you need -servername to work around t | 19:05 |
fungi | it | 19:06 |
clarkb | or if you talk to an ip address to test a specific backend or something along those lines' | 19:06 |
fungi | yeah | 19:06 |
fungi | johnsom: so i can find the entry in the apache error log: | 19:08 |
fungi | AH00898: Error reading from remote server returned by /changes/openstack/tripleo-heat-templates~793969/revisions/1/related | 19:08 |
fungi | no luck finding a corresponding error in the gerrit logs | 19:08 |
fungi | but thanks for pointing it out, will keep a closer eye on it | 19:08 |
johnsom | Sure, NP | 19:08 |
fungi | apache returned the error at 18:40:57.114700 | 19:08 |
fungi | utc | 19:08 |
fungi | it's possible the error in the gerrit logs is delayed by some minutes due to an internal timeout or something, but there was nothing mentioning that change or project in the flood of noise gerrit spews to its error log | 19:09 |
clarkb | fungi: johnsom I think we have a 60 second timeout in place on the proxypass directive (via apache defaults) | 19:11 |
clarkb | I would expect apache to log that the timeout was hit rather than an error though | 19:11 |
fungi | refreshing the cacti graphs, now it shows there was a fairly large but very brief spike swapping in for the sample ~10-15 minutes after that error | 19:12 |
johnsom | Yeah, I have found that to not be called out as well as I would like in Apache. | 19:12 |
fungi | also the server's load average was up a bit around the time of the error: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=26&rra_id=all | 19:13 |
johnsom | I have spent some time tracking a uwsgi bug with custom compiled apache modules. The errors aren't always as direct to the issue as I had hoped. | 19:13 |
fungi | so maybe there was something going on which slowed the server response beyond apache's proxy timeout tolerance | 19:14 |
fungi | load average around 15 for a server with 16 vcpus though, not really substantial | 19:14 |
johnsom | In Designate, with uwsgi proxied through apache, we get a pipe closed error randomly in the apache log. I still need to put more time into that. | 19:15 |
fungi | anyway, snmp is being polled every 4 minutes, so the load average might have spiked much higher than 15 in that timespan | 19:15 |
fungi | er, every 5 minutes | 19:15 |
clarkb | TIL about pip install --ignore-installed. I wish there was a way to instruct --ignore-installed to leave system packages alone, but I'm not sure it can really distingusih without a bunch of cases for linux and other OSes | 19:16 |
fungi | pip doesn't want to have to care about things it didn't install. the pip maintainers still consider `sudo pip` to be a case of "you're doing it wrong" | 19:17 |
clarkb | yup, I know. Just wondering if there was a better way to address https://review.opendev.org/c/openstack/devstack/+/797069/3/tools/fixup_stuff.sh without rewriting devstack to use a virutalenv (whcih has been tried numerous times and failed for various reasons) | 19:18 |
fungi | but probably worth trying again | 19:18 |
clarkb | ya the biggest hurdle at this point is grenade iirc. That and maybe a functional job or two (ironic and swift I think) that make assumptions about the installation that don't hold if moved to a virtualenv | 19:19 |
fungi | there will come a time when the pip maintainers are going to be all "yeah we're just going to add a check in pip to see if it's being run as root, and then exit 1" | 19:19 |
fungi | the discussion has come down to the pip maintainers on one side who don't want to support pip installing into system-wide paths, and the distro package maintainers on the other side who don't want pip installing into system-wide paths | 19:20 |
fungi | so anything relying on `sudo pip install` is running on borrowed time at this point | 19:21 |
clarkb | to be fair /usr/local is intended for this purpose isn't it? But maybe that escape hatch isn't sufficient for modern software and we need to move beyond it | 19:22 |
clarkb | (once upon a time in a galaxy far far away we nfs mounted /usr/local on sparc solaris to provide a set of gnu tools because the solaris ones lacked features and compatibility with a bunch of stuff) | 19:22 |
fungi | that works well as long as your sparc /usr/local wasn't also mounted on your x86 machines | 19:23 |
fungi | or you didn't mount the sparc64 /usr/local to 32-bit sparc hosts | 19:23 |
clarkb | ya we discovered that the hard way when we got a few x86 solaris machines. Had to do smarter nfs mounts after that | 19:23 |
clarkb | not sure what I would've done without those gnu tools though. Probably argued for installing linux on the machines >_> | 19:24 |
fungi | but yes, i worked a site which did rootnfs boot with sun workstations, it gets fun when your architectures begin to vary | 19:24 |
clarkb | melwitt: the latest version of https://review.opendev.org/c/opendev/jeepyb/+/795912/ looks great. I wish the gerrit api existed and was as useful when we first wrote tools for this :) | 20:13 |
opendevreview | Merged opendev/jeepyb master: Convert update_blueprint to use the Gerrit REST API https://review.opendev.org/c/opendev/jeepyb/+/795912 | 20:47 |
opendevreview | Merged opendev/git-review master: Fix nodeset selections for zuul jobs https://review.opendev.org/c/opendev/git-review/+/796754 | 21:04 |
corvus | masterpe: you look like a normal user to me here :) | 21:12 |
masterpe[m] | yep | 21:12 |
*** ChanServ sets mode: +o corvus | 21:13 | |
corvus | ... | 21:17 |
*** ChanServ sets mode: -o corvus | 21:18 | |
*** ChanServ sets mode: +o corvus | 21:19 | |
corvus | okay, irc and matrix agree about my mod status here now (though it took an on/off cycle for that to happen, probably because of my kick the other day) | 21:20 |
*** ChanServ sets mode: -o corvus | 21:22 | |
ianw | clarkb: thanks for updating acme.sh. switching the default feels odd and i hope it doesn't suggest anything weird going on the background (a bit burnt by recent irc :) | 22:29 |
ianw | https://github.com/containers/podman/issues/10717 suggests we've done something to the capabilities of shadow-utils in our fedora 34 installs. i don't imagine anyone is going to be jumping to debug that, it's on my todo list | 22:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!