opendevreview | Merged opendev/infra-specs master: Add Matrix spec https://review.opendev.org/c/opendev/infra-specs/+/796156 | 00:11 |
---|---|---|
opendevreview | Ian Wienand proposed openstack/project-config master: Retire ara projects https://review.opendev.org/c/openstack/project-config/+/777675 | 01:10 |
opendevreview | Merged opendev/system-config master: Point cacti at review02 explicitly https://review.opendev.org/c/opendev/system-config/+/801399 | 01:16 |
opendevreview | Merged openstack/project-config master: Update ceph grafana for the current jobs https://review.opendev.org/c/openstack/project-config/+/800116 | 01:21 |
ianw | those "view on github.com" links are pretty handy for https://cs.opensource.google/gerrit | 01:26 |
opendevreview | Merged openstack/diskimage-builder master: Update IRC networks https://review.opendev.org/c/openstack/diskimage-builder/+/801758 | 02:06 |
*** bhagyashris__ is now known as bhagyashris | 04:41 | |
*** marios is now known as marios|ruck | 05:20 | |
*** amoralej|off is now known as amoralej | 06:56 | |
*** marios|ruck is now known as marios | 07:01 | |
*** marios is now known as marios|ruck | 07:05 | |
*** rpittau|afk is now known as rpittau | 07:35 | |
*** marios_ is now known as marios | 08:06 | |
*** marios is now known as marios|ruck | 08:08 | |
*** ykarel is now known as ykarel|lunch | 09:07 | |
*** ykarel|lunch is now known as ykarel | 10:33 | |
*** mgoddard- is now known as mgoddard | 12:18 | |
*** amoralej is now known as amoralej|lunch | 13:09 | |
*** amoralej|lunch is now known as amoralej | 14:01 | |
*** rpittau is now known as rpittau|afk | 14:13 | |
clarkb | mnaser: did you still want to reboot the gerrit server at some point? | 14:35 |
clarkb | Looks like my held node for the gerrit functioanl testing of my fix did hold as expected. I'll run through some testing on that as soon as my morning meeting is over. | 14:53 |
*** ykarel is now known as ykarel|away | 14:56 | |
corvus | clarkb: do you want to look at topic:matrix today? | 15:13 |
corvus | tristanC: in https://review.opendev.org/800506 i tihnk you have gerritbot joining 2 rooms that don't exist -- what will happen there? | 15:13 |
clarkb | corvus: added to me todo list | 15:15 |
clarkb | s/me/my/ <- typing is hard | 15:16 |
corvus | bit early for talk like a pirate day | 15:16 |
fungi | arr | 15:16 |
corvus | arr | 15:16 |
opendevreview | Merged openstack/project-config master: tripleo-common-tempest-plugin - Step 2: End project Gating https://review.opendev.org/c/openstack/project-config/+/800154 | 15:16 |
fungi | where i live, every day is talk like a pirate day | 15:16 |
corvus | fungi: iiuc, you live among the ghosts of basically all the pirates | 15:17 |
fungi | yep, and our local museum has artifacts now confirmed to be from the remains of blackbeard's wreck of the queen anne's revenge just off the coast | 15:18 |
clarkb | and now there is a new video game where you can take over fungi's island | 15:18 |
fungi | there is? | 15:18 |
clarkb | New World <- Amazon's first video game. Its actualyl set in a fictional land but heavily based on colonial americas | 15:19 |
fungi | ahh, got it. will have to check that out | 15:19 |
corvus | i just added the dnskey to the registrar for gating.dev | 15:21 |
tristanC | corvus: the bot abort if it can't join the rooms | 15:30 |
clarkb | it won't create the channels then? | 15:32 |
corvus | tristanC: it will exit completely? | 15:33 |
corvus | (or will it just not join those rooms?) | 15:33 |
tristanC | corvus: it prints the invalid rooms and exit | 15:35 |
tristanC | corvus: would you prefer another behavior? | 15:35 |
corvus | no, i think that's fine. i do think that means we should revise that patch to only include #test for now | 15:36 |
corvus | tristanC: ^ you want to make that change? or i can | 15:43 |
opendevreview | Tristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop https://review.opendev.org/c/opendev/system-config/+/800506 | 15:43 |
tristanC | corvus: here it is, then there will be a warning about the ssh connection, which should be retried infinitely until we provide a valid key | 15:45 |
corvus | tristanC: oh is it not using the existing key? | 15:47 |
corvus | it looks like gerritbotsshkey is the varue we're using for the existing gerritbot, so i think the patch should work as written | 15:48 |
corvus | also, sorry i still haven't fixed the weechat underscore thing | 15:51 |
corvus | but you get the idea | 15:51 |
clarkb | the fedora mirror has grown by 120GB in the last day compeltely wiping out all improvements the yum-puppetlabs trimming did :/ | 15:51 |
clarkb | I think we need to reduce mirror.fedora's quota significantly to prevent that mirror from filling our disk | 15:52 |
clarkb | also looks like we may not have released in a few days? I wonder if it is doing some giant sync? | 15:53 |
fungi | seems likely, and we delete after | 15:55 |
fungi | so massive churn and a timed out rsync could explain the sudden growth | 15:56 |
fungi | or do we delete after? now i need to check that assertion | 15:56 |
clarkb | I'm not sure, but that could explain it | 15:56 |
fungi | i'm wrong, we just use --delete which i think deletes first | 15:57 |
clarkb | basically all new packages for a glibc recompile or whatever, we grow then delete? | 15:57 |
fungi | quick batman, to the manpage! | 15:57 |
clarkb | I'm just worried that if the trend holds our vicepa will be full in a few days | 15:57 |
fungi | "if none of the --delete-WHEN options are specified, rsync will choose the --delete-during algorithm when talking to rsync 3.0.0 or newer, and the --delete-before algorithm when talking to an older rsync." [also sprach man rsync(1)] | 16:00 |
fungi | so hard to know which it's doing without figuring out what rsync version is serving what we're copying | 16:01 |
fungi | and /var/log/rsync-mirrors/fedora.log doesn't seem to indicate | 16:02 |
fungi | though the log does show it deleting tons of files | 16:02 |
fungi | and it looks like the deletes are logged prior to the copies | 16:03 |
*** marios|ruck is now known as marios | 16:03 | |
fungi | so there goes my theory | 16:03 |
clarkb | I think if we set the fedora quota to 600GB that would mimic other distros and prevent it from filling vicepa | 16:04 |
clarkb | (if I've done my math properly) | 16:04 |
clarkb | and give it another ~110GB to grow into | 16:04 |
fungi | that seems like a pragmatic choice for now until we can figure out what's happening there | 16:04 |
clarkb | ok I'll do that after this meeting | 16:05 |
*** marios is now known as marios|out | 16:08 | |
fungi | it looks like we started syncing a bunch of churn for fedora on 2021-07-19 around the middle of the utc day, looking at our log. maybe new point releases of f32 and f33? | 16:10 |
clarkb | thats when our last vos release happened too so wouldn't surprise me if ya we started a very long update since then? | 16:11 |
clarkb | Is it possible the reboots interrupted that too or is the sync still running? | 16:11 |
fungi | sync is still going. 15:00:35 utc today it tried to resume a sync for updates/32 which was killed (presumably by timeout) at 15:30:39 utc | 16:13 |
fungi | after copying a bunch of new files | 16:13 |
fungi | another possibility is that fedora has rearranged their deck chairs | 16:14 |
clarkb | I have done the quota update | 16:14 |
clarkb | that should hopefully be plenty of room to grow while also avoiding filling the disk and disrupting others if it isn't | 16:14 |
fungi | now that the quota's done, i'm going to manually take the lock in a screen session and start a sync without any timeout | 16:14 |
fungi | starting this in a root screen session now: NO_TIMEOUT=1 flock -n /var/run/fedora-mirror.lock fedora-mirror-update mirror.fedora 2>&1 | tee /var/log/rsync-mirrors/fedora.log | 16:17 |
fungi | that's on mirror-update.o.o, obviously | 16:18 |
fungi | ianw: "Backups failed on host gitea01 at Fri Jul 23 05:56:42 UTC 2021." :/ | 16:19 |
fungi | seems the reboot didn't fix it for long | 16:19 |
clarkb | if anyone is wondering you really need to update the canonical web url on your held testing gerrits in order to test log in stuff. Otherwise you get redirected to prod and it complains there and logs you out | 16:25 |
clarkb | Maybe we should just bake that into our testing images for simplicy? | 16:25 |
clarkb | (I set it up as reviewtesting.opendev.org in /etc/hosts and in the canonical web url then things work) | 16:25 |
fungi | makes sense, yeah i'd support that change | 16:26 |
fungi | i guess the alternative is to change your /etc/hosts to associate the production hostname with the held node's ip address? | 16:27 |
fungi | but that does make it hard to also use the production system from the same machine where you're also connecting to the held node to test it | 16:28 |
clarkb | ya and that will probably confuse the brwosers too due to cookies | 16:28 |
*** amoralej is now known as amoralej|off | 16:29 | |
clarkb | ok https://gerrit-review.googlesource.com/c/gerrit/+/312302 has been updated to indicate I have tested the latest +2'd version of the change | 16:35 |
clarkb | functionally tested it I mean. There are unittests in the change | 16:36 |
clarkb | also added a note on https://review.opendev.org/c/opendev/system-config/+/800832 that if necessary we can make that stop failing in testinfra artificially and carry the patch ourselves loaclly. Though I expect I will abandon that change as soon as upstream lands me change | 16:40 |
clarkb | fungi: re gitea backups i don't think the reboot fixed it. Ping was never the issue it was richer protocols | 16:47 |
clarkb | fungi: basically the ping test we did post reboot wasn't sufficient to check it | 16:47 |
fungi | ahh, you're right. it was hping or telnet i was testing with to reproduce the failure | 16:48 |
fungi | or mtr's tcp mode maybe | 16:48 |
clarkb | I keep doing me instead of my | 16:49 |
clarkb | have I become a pirate? | 16:49 |
fungi | this rsync seems to be progressing fairly quickly. i have a feeling the mirror is so large and consists of so many files that the overhead of rsync scanning everything on both sides to work out where it left off so it can resume eats into much of the timeout, leaving little time to actually make progress on each run | 16:55 |
fungi | i wouldn't be surprised if this finishes in a matter of a few hours | 16:55 |
fungi | assuming it doesn't run out of space, that is | 16:55 |
clarkb | fyi I've responded to https://github.com/openstack/diskimage-builder/pull/27 pointing them at gerrit and docs for pushing to gerrit | 16:56 |
clarkb | I did not close the PR because I'm wondering if our PR closer is still working | 16:56 |
clarkb | I don't know what became of that after the dockerization of gerrit and change to mirroring configs | 16:57 |
fungi | looks like the fedora mirror sync is on to its vos release phase now | 18:16 |
opendevreview | Merged openstack/project-config master: Remove noop jobs for deprecated os-panko https://review.opendev.org/c/openstack/project-config/+/799808 | 18:25 |
opendevreview | Merged openstack/project-config master: Retire django-openstack-auth https://review.opendev.org/c/openstack/project-config/+/800532 | 18:27 |
fungi | clarkb: now i'm starting to wonder if the increase in volume usage was actually divergence between the rw and ro volumes | 18:59 |
fungi | if so we should see it drop after the vos release completes | 18:59 |
clarkb | fungi: oh interesting I guess it has to keep the copies of the old stuff until it successfully releases | 19:37 |
fungi | right, and that part is... taking a while | 19:38 |
fungi | lots of data to transfer apparently, vos release is still in progress | 19:38 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM test the rename_repos playbook https://review.opendev.org/c/opendev/system-config/+/802112 | 20:15 |
clarkb | fungi: corvus: ^ fyi thats the hacked together test change I've got for testing rename_repos | 20:15 |
clarkb | (and why the rename use case came up in my head for the zk key management) | 20:15 |
clarkb | its a bit hacked together because the test envs and the prod envs don't all align on their projects and ssh keys | 20:16 |
clarkb | if you take a look at that it would probably be good to ensure I haven't missed anything obvious in the porting to ensure the testing there is as valid as possible. eg no null success cases | 20:18 |
clarkb | ok time to find a spot outside in the shade and review topic:matrix | 20:28 |
fungi | debian bullseye release date announced just now as 2021-08-14 | 20:33 |
fungi | so roughly 3 weeks | 20:33 |
corvus | i hope they hit their target | 20:34 |
fungi | hah | 20:34 |
clarkb | corvus: left some notes on https://review.opendev.org/c/opendev/system-config/+/800317 nothing super critical but I think a couple of them may be worth addressing (particularly around reconnceting) | 21:03 |
clarkb | corvus: another question I've got is if your eavesdrop bot uses a token or a n actual password? It seems the preferenceis to use tokens? not sure if they are equivalent in the code | 21:17 |
corvus | clarkb: replied with a followup q. | 21:18 |
corvus | clarkb: tokens are obtained with a password; the bot does that. | 21:19 |
corvus | (you establish a session with a password, the session is keyed with a token, and you use that session forever) | 21:19 |
clarkb | corvus: ok, I noticed bceause gerritbot wants the token not a password | 21:20 |
clarkb | tristanC: corvus left some thoughts on https://review.opendev.org/c/opendev/system-config/+/800506 as well | 21:20 |
clarkb | corvus: and responded to your question | 21:22 |
corvus | clarkb: i don't know the answer re gerritbot. if there's some out-of-band process to establish a session and get the token, i'm not sure i'm a fan of that. i think the bot should obtain the token itself and store it locally. that makes the entire process self-bootstrapping, testable, and can support disaster recovery. | 21:24 |
clarkb | corvus: I think thee is an out of band process where you can get one. I recall I got one when I created the admin user on the homeserver | 21:26 |
opendevreview | James E. Blair proposed opendev/system-config master: Add matrix-eavesdrop container image https://review.opendev.org/c/opendev/system-config/+/800317 | 21:28 |
corvus | clarkb: ^ | 21:28 |
clarkb | +2 thanks | 21:29 |
corvus | clarkb: you could run a curl command to log in and then grab that token. i figured it's friendlier to have the program do that. | 21:31 |
clarkb | ya I wonder if the idea is that tokens can have more limited permissions so you want to prefer tokens for security purposes? but if the account is already limited in its abilities... | 21:31 |
corvus | clarkb: i don't agree that tokens are preferred | 21:32 |
corvus | i mean, they are required to use the api -- that's just how the api works | 21:32 |
corvus | the question is, what is the input to the application? a password, or a session token. | 21:33 |
corvus | they are exactly equivalent from a security pov | 21:33 |
clarkb | corvus: ya I'm not asserting that just wondering if that may be part of the consideration. | 21:33 |
clarkb | I guess this would be good feedback to tristanC's gerritbot then? | 21:33 |
corvus | sorry, i thought i saw you say they were preferred | 21:33 |
corvus | in every case, here's the order of operations: 1) send username/password to server in order to obtain session token. 2) save session token 3) use that session token forever to interact with the server. | 21:34 |
corvus | one choice is to have steps 1,2,3 performed by the bot. | 21:35 |
corvus | another choice is to have steps 1,2 performed by humans (and step is store the token in private ansible vars) and step 3 is performed by the bot | 21:35 |
corvus | so i think which is preferred has to do with what kind of persistent storage is available to the bot, what its lifecycle is, etc. | 21:36 |
corvus | (and other automation considerations around it) | 21:36 |
corvus | if the bot has no persistent storage and is itself considered ephemeral, then it's probably better to give the bot a token rather than a password, because it would be establishing sessions all the time | 21:37 |
corvus | but at least bots that receive messages need to store data to checkpoint their syncs, so storing a token is no big deal. | 21:38 |
clarkb | gotcha | 21:38 |
corvus | if gerritbot has no local state whatsoever, and would need to add it merely for the purpose of storing the session token, then that would be a pretty good reason to consider the token-as-input approach | 21:39 |
clarkb | and since gerritbot is more map gerrit stream into matrix it is far more ephemeral and does't necessarily need storage (though as implemented for us it does) | 21:39 |
corvus | if it's storing state anyway, then i'd argue password-as-input is more op-friendly | 21:39 |
clarkb | its got the yaml dhall config stuff which I suppose is mor ean implementation detail than process requirement | 21:40 |
clarkb | but is state in our case | 21:40 |
corvus | to be clear, i'm okay with token-as-input if that's the way it's written; though we should find out what input we need to provide. :) | 21:43 |
corvus | (i have a preference; it's not a strong one, and this is an opportunity to gain experience) | 21:43 |
fungi | purely from a security pov, i agree if the token has the same privileges as the granting account then there's no security-related reason to avoid giving the account credentials to the application rather than manually issuing a token for it | 21:57 |
fungi | tokens have gained popularity with online services where one account may grant (and perhaps also later revoke) multiple limited-scope tokens for automation | 21:58 |
fungi | since we can afford to have a separate account for each application we can simply invalidate its account when we no longer need it | 21:58 |
fungi | and simplify things in the process since there's only one set of credentials to store rather than two | 21:59 |
clarkb | 22:01 | |
clarkb | oops | 22:01 |
fungi | do or do not, there is no oops | 22:02 |
clarkb | I migrated back inside from the shade because the day star moved far enough down towards the horizon to remove my shade and bbq my knees | 22:02 |
clarkb | on resume from suspend I derped the reconnection | 22:03 |
fungi | bbq knees are a thing here in the south. along with knuckles, scrapple and head cheese | 22:03 |
fungi | 2021-07-23 20:57:10 | Released volume mirror.fedora successfully | 22:05 |
fungi | i'm starting a second run now in the same screen session just to make sure it's ~ a no-op | 22:05 |
opendevreview | James E. Blair proposed opendev/system-config master: Run matrix-eavesdrop on eavesdrop https://review.opendev.org/c/opendev/system-config/+/800320 | 22:18 |
opendevreview | James E. Blair proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop https://review.opendev.org/c/opendev/system-config/+/800506 | 22:18 |
corvus | those are just rebases on the updated first change | 22:18 |
clarkb | wow https://review.opendev.org/c/opendev/system-config/+/802112 got a +1 from zuul I really didn't expect that | 22:20 |
clarkb | that implies the rename playbook is working as epxected against gitea and gerrit | 22:20 |
clarkb | fungi: ^ you might want to look that over since it is related to the planned renaming | 22:24 |
clarkb | fungi: but skimming the job logs it does seem to have run on gerrit and done the rename then test infra checked it after | 22:24 |
clarkb | also I think ansible is double logging things did newer ansible start doing more verboes logging? | 22:27 |
clarkb | I think the gitea job also did well I see the transfer of orgs for my test and it got the expected http 302 response | 22:30 |
fungi | clarkb: yeah 802112 looks to me like it did the thing. can we add a permanent test like that? | 22:41 |
clarkb | fungi: I think we can but we need to converge the prod env and the test envs a bit more so that we can use a consistent ssh key and user | 22:42 |
clarkb | fungi: I think that is possible we'll want ot change the name of the admin user in the test env and have it use the ssh key for gerrit2? something like that | 22:42 |
clarkb | feel free to push new patchests that converge it a bit more. otherwise I'll try to work on that next week | 22:43 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!