Friday, 2021-07-23

opendevreviewMerged opendev/infra-specs master: Add Matrix spec
opendevreviewIan Wienand proposed openstack/project-config master: Retire ara projects
opendevreviewMerged opendev/system-config master: Point cacti at review02 explicitly
opendevreviewMerged openstack/project-config master: Update ceph grafana for the current jobs
ianwthose "view on" links are pretty handy for
opendevreviewMerged openstack/diskimage-builder master: Update IRC networks
*** bhagyashris__ is now known as bhagyashris04:41
*** marios is now known as marios|ruck05:20
*** amoralej|off is now known as amoralej06:56
*** marios|ruck is now known as marios07:01
*** marios is now known as marios|ruck07:05
*** rpittau|afk is now known as rpittau07:35
*** marios_ is now known as marios08:06
*** marios is now known as marios|ruck08:08
*** ykarel is now known as ykarel|lunch09:07
*** ykarel|lunch is now known as ykarel10:33
*** mgoddard- is now known as mgoddard12:18
*** amoralej is now known as amoralej|lunch13:09
*** amoralej|lunch is now known as amoralej14:01
*** rpittau is now known as rpittau|afk14:13
clarkbmnaser: did you still want to reboot the gerrit server at some point?14:35
clarkbLooks like my held node for the gerrit functioanl testing of my fix did hold as expected. I'll run through some testing on that as soon as my morning meeting is over.14:53
*** ykarel is now known as ykarel|away14:56
corvusclarkb: do you want to look at topic:matrix today?15:13
corvustristanC: in i tihnk you have gerritbot joining 2 rooms that don't exist -- what will happen there?15:13
clarkbcorvus: added to me todo list15:15
clarkbs/me/my/ <- typing is hard15:16
corvusbit early for talk like a pirate day15:16
opendevreviewMerged openstack/project-config master: tripleo-common-tempest-plugin - Step 2: End project Gating
fungiwhere i live, every day is talk like a pirate day15:16
corvusfungi: iiuc, you live among the ghosts of basically all the pirates15:17
fungiyep, and our local museum has artifacts now confirmed to be from the remains of blackbeard's wreck of the queen anne's revenge just off the coast15:18
clarkband now there is a new video game where you can take over fungi's island15:18
fungithere is?15:18
clarkbNew World <- Amazon's first video game. Its actualyl set in a fictional land but heavily based on colonial americas15:19
fungiahh, got it. will have to check that out15:19
corvusi just added the dnskey to the registrar for gating.dev15:21
tristanCcorvus: the bot abort if it can't join the rooms15:30
clarkbit won't create the channels then?15:32
corvustristanC: it will exit completely?15:33
corvus(or will it just not join those rooms?)15:33
tristanCcorvus: it prints the invalid rooms and exit 15:35
tristanCcorvus: would you prefer another behavior?15:35
corvusno, i think that's fine.  i do think that means we should revise that patch to only include #test for now15:36
corvustristanC: ^ you want to make that change?  or i can15:43
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
tristanCcorvus: here it is, then there will be a warning about the ssh connection, which should be retried infinitely until we provide a valid key15:45
corvustristanC: oh is it not using the existing key?15:47
corvusit looks like gerritbotsshkey is the varue we're using for the existing gerritbot, so i think the patch should work as written15:48
corvusalso, sorry i still haven't fixed the weechat underscore thing15:51
corvusbut you get the idea15:51
clarkbthe fedora mirror has grown by 120GB in the last day compeltely wiping out all improvements the yum-puppetlabs trimming did :/15:51
clarkbI think we need to reduce mirror.fedora's quota significantly to prevent that mirror from filling our disk15:52
clarkbalso looks like we may not have released in a few days? I wonder if it is doing some giant sync?15:53
fungiseems likely, and we delete after15:55
fungiso massive churn and a timed out rsync could explain the sudden growth15:56
fungior do we delete after? now i need to check that assertion15:56
clarkbI'm not sure, but that could explain it15:56
fungii'm wrong, we just use --delete which i think deletes first15:57
clarkbbasically all new packages for a glibc recompile or whatever, we grow then delete?15:57
fungiquick batman, to the manpage!15:57
clarkbI'm just worried that if the trend holds our vicepa will be full in a few days15:57
fungi"if none of the --delete-WHEN options are specified, rsync will choose the --delete-during algorithm when talking to rsync 3.0.0 or newer, and the --delete-before algorithm when talking to an older rsync." [also sprach man rsync(1)]16:00
fungiso hard to know which it's doing without figuring out what rsync version is serving what we're copying16:01
fungiand /var/log/rsync-mirrors/fedora.log doesn't seem to indicate16:02
fungithough the log does show it deleting tons of files16:02
fungiand it looks like the deletes are logged prior to the copies16:03
*** marios|ruck is now known as marios16:03
fungiso there goes my theory16:03
clarkbI think if we set the fedora quota to 600GB that would mimic other distros and prevent it from filling vicepa16:04
clarkb(if I've done my math properly)16:04
clarkband give it another ~110GB to grow into16:04
fungithat seems like a pragmatic choice for now until we can figure out what's happening there16:04
clarkbok I'll do that after this meeting16:05
*** marios is now known as marios|out16:08
fungiit looks like we started syncing a bunch of churn for fedora on 2021-07-19 around the middle of the utc day, looking at our log. maybe new point releases of f32 and f33?16:10
clarkbthats when our last vos release happened too so wouldn't surprise me if ya we started a very long update since then?16:11
clarkbIs it possible the reboots interrupted that too or is the sync still running?16:11
fungisync is still going. 15:00:35 utc today it tried to resume a sync for updates/32 which was killed (presumably by timeout) at 15:30:39 utc16:13
fungiafter copying a bunch of new files16:13
fungianother possibility is that fedora has rearranged their deck chairs16:14
clarkbI have done the quota update16:14
clarkbthat should hopefully be plenty of room to grow while also avoiding filling the disk and disrupting others if it isn't16:14
funginow that the quota's done, i'm going to manually take the lock in a screen session and start a sync without any timeout16:14
fungistarting this in a root screen session now: NO_TIMEOUT=1 flock -n /var/run/fedora-mirror.lock fedora-mirror-update mirror.fedora 2>&1 | tee /var/log/rsync-mirrors/fedora.log16:17
fungithat's on mirror-update.o.o, obviously16:18
fungiianw: "Backups failed on host gitea01 at Fri Jul 23 05:56:42 UTC 2021." :/16:19
fungiseems the reboot didn't fix it for long16:19
clarkbif anyone is wondering you really need to update the canonical web url on your held testing gerrits in order to test log in stuff. Otherwise you get redirected to prod and it complains there and logs you out16:25
clarkbMaybe we should just bake that into our testing images for simplicy?16:25
clarkb(I set it up as in /etc/hosts and in the canonical web url then things work)16:25
fungimakes sense, yeah i'd support that change16:26
fungii guess the alternative is to change your /etc/hosts to associate the production hostname with the held node's ip address?16:27
fungibut that does make it hard to also use the production system from the same machine where you're also connecting to the held node to test it16:28
clarkbya and that will probably confuse the brwosers too due to cookies16:28
*** amoralej is now known as amoralej|off16:29
clarkbok has been updated to indicate I have tested the latest +2'd version of the change16:35
clarkbfunctionally tested it I mean. There are unittests in the change16:36
clarkbalso added a note on that if necessary we can make that stop failing in testinfra artificially and carry the patch ourselves loaclly. Though I expect I will abandon that change as soon as upstream lands me change16:40
clarkbfungi: re gitea backups i don't think the reboot fixed it. Ping was never the issue it was richer protocols16:47
clarkbfungi: basically the ping test we did post reboot wasn't sufficient to check it16:47
fungiahh, you're right. it was hping or telnet i was testing with to reproduce the failure16:48
fungior mtr's tcp mode maybe16:48
clarkbI keep doing me instead of my16:49
clarkbhave I become a pirate?16:49
fungithis rsync seems to be progressing fairly quickly. i have a feeling the mirror is so large and consists of so many files that the overhead of rsync scanning everything on both sides to work out where it left off so it can resume eats into much of the timeout, leaving little time to actually make progress on each run16:55
fungii wouldn't be surprised if this finishes in a matter of a few hours16:55
fungiassuming it doesn't run out of space, that is16:55
clarkbfyi I've responded to pointing them at gerrit and docs for pushing to gerrit16:56
clarkbI did not close the PR because I'm wondering if our PR closer is still working16:56
clarkbI don't know what became of that after the dockerization of gerrit and change to mirroring configs16:57
fungilooks like the fedora mirror sync is on to its vos release phase now18:16
opendevreviewMerged openstack/project-config master: Remove noop jobs for deprecated os-panko
opendevreviewMerged openstack/project-config master: Retire django-openstack-auth
fungiclarkb: now i'm starting to wonder if the increase in volume usage was actually divergence between the rw and ro volumes18:59
fungiif so we should see it drop after the vos release completes18:59
clarkbfungi: oh interesting I guess it has to keep the copies of the old stuff until it successfully releases19:37
fungiright, and that part is... taking a while19:38
fungilots of data to transfer apparently, vos release is still in progress19:38
opendevreviewClark Boylan proposed opendev/system-config master: DNM test the rename_repos playbook
clarkbfungi: corvus: ^ fyi thats the hacked together test change I've got for testing rename_repos20:15
clarkb(and why the rename use case came up in my head for the zk key management)20:15
clarkbits a bit hacked together because the test envs and the prod envs don't all align on their projects and ssh keys20:16
clarkbif you take a look at that it would probably be good to ensure I haven't missed anything obvious in the porting to ensure the testing there is as valid as possible. eg no null success cases20:18
clarkbok time to find a spot outside in the shade and review topic:matrix20:28
fungidebian bullseye release date announced just now as 2021-08-1420:33
fungiso roughly 3 weeks20:33
corvusi hope they hit their target20:34
clarkbcorvus: left some notes on nothing super critical but I think a couple of them may be worth addressing (particularly around reconnceting)21:03
clarkbcorvus: another question I've got is if your eavesdrop bot uses a token or a n actual password? It seems the preferenceis to use tokens? not sure if they are equivalent in the code21:17
corvusclarkb: replied with a followup q.21:18
corvusclarkb: tokens are obtained with a password; the bot does that.21:19
corvus(you establish a session with a password, the session is keyed with a token, and you use that session forever)21:19
clarkbcorvus: ok, I noticed bceause gerritbot wants the token not a password21:20
clarkbtristanC: corvus  left some thoughts on as well21:20
clarkbcorvus: and responded to your question21:22
corvusclarkb: i don't know the answer re gerritbot.  if there's some out-of-band process to establish a session and get the token, i'm not sure i'm a fan of that.  i think the bot should obtain the token itself and store it locally.  that makes the entire process self-bootstrapping, testable, and can support disaster recovery.21:24
clarkbcorvus: I think thee is an out of band process where you can get one. I recall I got one when I created the admin user on the homeserver21:26
opendevreviewJames E. Blair proposed opendev/system-config master: Add matrix-eavesdrop container image
corvusclarkb: ^21:28
clarkb+2 thanks21:29
corvusclarkb: you could run a curl command to log in and then grab that token.  i figured it's friendlier to have the program do that.21:31
clarkbya I wonder if the idea is that tokens can have more limited permissions so you want to prefer tokens for security purposes? but if the account is already limited in its abilities...21:31
corvusclarkb: i don't agree that tokens are preferred21:32
corvusi mean, they are required to use the api -- that's just how the api works21:32
corvusthe question is, what is the input to the application?  a password, or a session token.21:33
corvusthey are exactly equivalent from a security pov21:33
clarkbcorvus: ya I'm not asserting that just wondering if that may be part of the consideration.21:33
clarkbI guess this would be good feedback to tristanC's gerritbot then?21:33
corvussorry, i thought i saw you say they were preferred21:33
corvusin every case, here's the order of operations: 1) send username/password to server in order to obtain session token.  2) save session token  3) use that session token forever to interact with the server.21:34
corvusone choice is to have steps 1,2,3 performed by the bot.21:35
corvusanother choice is to have steps 1,2 performed by humans (and step is store the token in private ansible vars) and step 3 is performed by the bot21:35
corvusso i think which is preferred has to do with what kind of persistent storage is available to the bot, what its lifecycle is, etc.21:36
corvus(and other automation considerations around it)21:36
corvusif the bot has no persistent storage and is itself considered ephemeral, then it's probably better to give the bot a token rather than a password, because it would be establishing sessions all the time21:37
corvusbut at least bots that receive messages need to store data to checkpoint their syncs, so storing a token is no big deal.21:38
corvusif gerritbot has no local state whatsoever, and would need to add it merely for the purpose of storing the session token, then that would be a pretty good reason to consider the token-as-input approach21:39
clarkband since gerritbot is more map gerrit stream into matrix it is far more ephemeral and does't necessarily need storage (though as implemented for us it does)21:39
corvusif it's storing state anyway, then i'd argue password-as-input is more op-friendly21:39
clarkbits got the yaml dhall config stuff which I suppose is mor ean implementation detail than process requirement21:40
clarkbbut is state in our case21:40
corvusto be clear, i'm okay with token-as-input if that's the way it's written; though we should find out what input we need to provide.  :)21:43
corvus(i have a preference; it's not a strong one, and this is an opportunity to gain experience)21:43
fungipurely from a security pov, i agree if the token has the same privileges as the granting account then there's no security-related reason to avoid giving the account credentials to the application rather than manually issuing a token for it21:57
fungitokens have gained popularity with online services where one account may grant (and perhaps also later revoke) multiple limited-scope tokens for automation21:58
fungisince we can afford to have a separate account for each application we can simply invalidate its account when we no longer need it21:58
fungiand simplify things in the process since there's only one set of credentials to store rather than two21:59
clarkb  22:01
fungido or do not, there is no oops22:02
clarkbI migrated back inside from the shade because the day star moved far enough down towards the horizon to remove my shade and bbq my knees22:02
clarkbon resume from suspend I derped the reconnection22:03
fungibbq knees are a thing here in the south. along with knuckles, scrapple and head cheese22:03
fungi2021-07-23 20:57:10  | Released volume mirror.fedora successfully22:05
fungii'm starting a second run now in the same screen session just to make sure it's ~ a no-op22:05
opendevreviewJames E. Blair proposed opendev/system-config master: Run matrix-eavesdrop on eavesdrop
opendevreviewJames E. Blair proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
corvusthose are just rebases on the updated first change22:18
clarkbwow got a +1 from zuul I really didn't expect that22:20
clarkbthat implies the rename playbook is working as epxected against gitea and gerrit22:20
clarkbfungi: ^ you might want to look that over since it is related to the planned renaming22:24
clarkbfungi: but skimming the job logs it does seem to have run on gerrit and done the rename then test infra checked it after22:24
clarkbalso I think ansible is double logging things did newer ansible start doing more verboes logging?22:27
clarkbI think the gitea job also did well I see the transfer of orgs for my test and it got the expected http 302 response22:30
fungiclarkb: yeah 802112 looks to me like it did the thing. can we add a permanent test like that?22:41
clarkbfungi: I think we can but we need to converge the prod env and the test envs a bit more so that we can use a consistent ssh key and user22:42
clarkbfungi: I think that is possible we'll want ot change the name of the admin user in the test env and have it use the ssh key for gerrit2? something like that22:42
clarkbfeel free to push new patchests that converge it a bit more. otherwise I'll try to work on that next week22:43

Generated by 2.17.2 by Marius Gedminas - find it at!