Thursday, 2020-07-16

*** ryohayakawa has joined #opendev00:03
fungiianw: that sounds like it could we what we saw on review0100:45
ianwfungi: yeah.  i'm afraid i think bup is a dead end, i mean we knew that, but python3 just seems too far off00:45
ianwi'm investigating rdiff-backup00:46
fungiooh, i used to use rdiff-backup ages ago00:46
ianwthat has a *long* history, which is good00:46
fungilike, i think i remember using it at least 15 years ago00:47
fungiclarkb: you said you had one you really like too, right?00:47
fungii'll readily admit, my personal backups are just lvm snapshots and rsync to remote hosts00:47
ianwthe only trick with it is that it seems python2 and python3 versions of rdiff-backup are incompatible00:48
clarkbI use borgbackup00:57
*** shtepanie has quit IRC01:00
*** xiaolin has joined #opendev01:01
ianwessentially they all look like the same model of doing stuff over ssh01:14
*** ysandeep is now known as ysandeep|afk01:23
*** tkajinam has quit IRC01:55
*** tkajinam has joined #opendev01:55
*** ysandeep|afk is now known as ysandeep01:59
*** ysandeep is now known as ysandeep|afk02:24
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** rh-jelabarre has quit IRC03:46
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
openstackgerritMerged opendev/system-config master: Added development/rawhide image
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** cloudnull has quit IRC04:52
*** cloudnull has joined #opendev04:53
*** ysandeep|afk is now known as ysandeep|rover05:00
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** marios has joined #opendev05:52
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
openstackgerritMerged openstack/project-config master: Switch dragonflow ACL to retired
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** zbr has quit IRC07:16
*** ryohayakawa has quit IRC07:20
*** ryohayakawa has joined #opendev07:20
*** bolg has joined #opendev07:27
*** dougsz has joined #opendev07:29
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** ysandeep|rover is now known as ysandeep|lunch07:38
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:02
*** tosky has joined #opendev08:02
*** zbr has joined #opendev08:02
*** tosky has quit IRC08:16
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** tosky has joined #opendev08:33
*** DSpider has joined #opendev08:37
*** zbr has quit IRC08:50
*** ysandeep|lunch is now known as ysandeep|rover09:12
*** dtantsur|afk is now known as dtantsur09:15
*** hiwkby has joined #opendev09:24
*** dougsz has quit IRC09:24
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** hiwkby has quit IRC09:31
*** dougsz has joined #opendev09:38
*** roman_g has joined #opendev09:50
*** bolg has quit IRC09:51
*** bolg has joined #opendev09:56
*** ryohayakawa has quit IRC10:34
*** rh-jelabarre has joined #opendev11:59
*** bolg has quit IRC12:21
*** fressi has joined #opendev12:29
ShadowJonathanpython 3.9 adds a PendingDeprecationWarning to `lib2to3`, and also says it can possibly be removed in the future13:04
fungia quick skim of indicates basically nobody here is using the 2to3 module13:07
fungifor that matter pretty much all projects have already gone python3-only13:08
fungiand back when compatible code was still a goal most of them were relying on the "six" module anyway13:09
fungiat the moment the bigger issues i'm worried about with deprecations are in pbr, and will hopefully be fixed with and or something like those13:14
ShadowJonathanah yeah, still pointing out, and i was mainly saying that as lib2to3 could be a deep dependency for other conversion libraries13:19
*** manfly000 has joined #opendev13:24
fungisure. by now i hope everyone's basically done converting and just drop conversion libs13:28
*** manfly000 has quit IRC14:23
openstackgerritJeremy Stanley proposed opendev/system-config master: Switch reprepro mirroring from Puppet to Ansible
openstackgerritJeremy Stanley proposed opendev/system-config master: Clean up old mirror-update puppetry
clarkbcorvus: mordred I think the ze servers are still in emergency right? we may want to do the container conversion first then switch to zk tls? Or bundle them together explicitly14:43
clarkbotherwise we'll end up with unupdated ze's and be confused I think14:44
corvusoh :(14:45
clarkbthat was a morning shower light bulb14:46
*** tkajinam has quit IRC14:47
clarkbwe could also convert ze01 back to non container, update them all to tls then do the container conversion afterwards14:48
*** zbr|ruck has quit IRC14:55
*** SotK has quit IRC14:55
*** zbr|ruck has joined #opendev14:57
*** SotK has joined #opendev14:57
*** ysandeep|rover is now known as ysandeep|away15:02
corvusi can start on the nodepool side; mordred was going to work on ze's starting today i think.  i think it would be fine to restart them into zk-tls and containers at the same time15:29
*** mlavalle has joined #opendev15:30
*** marios has quit IRC15:31
*** fressi has quit IRC15:59
clarkbcl566n over in #openstack-unregistered reports is not running zuul jobs and I can confirm that seems to be the case. Grepping in the zuul scheduelr debug logs for that change number then the associated event ids isn't helping much. It seems we got a ref-updated event when the commit message was updated which we decided we shouldn't do anything with16:07
clarkbthe recheck seems to have been ignored completely?16:08
clarkbanyone else have a moment to look at that? I'm going to switch to getting irc reg/identification sorted so we can use a proper channel16:08
clarkbcorvus: the parent change also seems to have been ignored16:08
clarkbso maybe something to do with it instead?16:08
clarkbI did confirm that the repo seems to have jobs configured as well:
corvusclarkb: the parent is 740836? that seems to have run zuul jobs16:10
corvusoh, but not enqueued?16:11
clarkbcorvus: yes, but about half an hour ago it was approved and should've gone into the gate which it seems to have not done16:11
corvuszuul currently has an event backlog16:11
corvusi think it's doing a reconfig16:12
corvuswhich it just finished16:12
corvusstill nothing on those changes though16:13
fungischeduler cpu utilization and memory pressure look perfectly fine/normal16:13
clarkbis ref-updated the normal event for a new patchset?16:14
clarkbthat seems to be the event we processed for that change roughly when the commit message was updated16:14
clarkbhrm I wonder if we missed the patchset created event or maybe it doesn't emit one if you edit commit message in the browser or something?16:15
clarkbthat doesn't explain why recheck seems to be a noop though16:15
corvusthere is a problem with 74082316:15
fungii'm getting a permission denied error from gerrit for one of the associated changes too16:16
corvus(found in an traceback by doing a context grep)16:16
clarkbfungi: me too16:16
clarkbthat seem to be a child of 740836 but on a separate tree from 74133316:17
clarkbbut I guess if zuul tries to process 740836 via 741333 and then hits the error from 740823 that could short circuit the whole thing?16:17
fungiyeah, the one with [WIP] prefixing the commit message subject line16:18
corvusfungi: 740823 is the error16:18
corvus836 works fine16:18
fungioh, indeed, gerrit wasn't updating the url when i clicked on it16:19
clarkbps9 loads16:19
clarkbhitting ps10 and ps11 doesn't seem to produce an error in the gerrit error log16:20
clarkblooks like those patchsets were being published with the web editor16:20
clarkbmaybe ps10 and ps11 were generated in a corrupt manner due to a bug there?16:20
fungii'm looking in the db16:20
clarkb"those patchsets" the recent ones that we can actually load and render info about16:21
clarkbthere isn't anything about ps10 or ps1116:21
clarkboh wait I'm wrong16:21
fungi10 and 11 were both created 2020-07-15 19:41:15 with the exact same commit id16:21
clarkbPatch Set 10: Published edit on patch set 9.16:21
clarkbPatch Set 11: Published edit on patch set 9.16:21
clarkbya so I'm thinking bug in the web editor created twice as many patchsets as we wanted and that made gerrit sad16:22
clarkbto fix this maybe we can push a ps12 using ps9?16:22
fungiin theory, that would probably work16:22
clarkbok I'll try to repush ps9 now16:22
corvusif that doesn't work, maybe abandon the change16:23
clarkbheh no new changes /me makes a new change16:24
clarkbchange loads in the web ui now16:26
corvus823 and 836 are both in check16:27
clarkbI rechecked 741333 but don't see it yet16:27
corvusit's there now16:27
clarkbcool I'm updating in -unregistered now16:28
*** sshnaidm is now known as sshnaidm|afk16:28
*** dougsz has quit IRC16:35
*** dtantsur is now known as dtantsur|afk16:40
*** ShadowJonathan has quit IRC17:06
*** ShadowJonathan has joined #opendev17:06
*** qchris has quit IRC18:09
*** qchris has joined #opendev18:22
clarkbfungi: left some thoughts on the identity broker spec18:23
*** dougsz has joined #opendev18:24
*** dougsz has quit IRC18:33
*** dougsz has joined #opendev18:33
*** dougsz has quit IRC18:44
*** roman_g has quit IRC18:49
*** roman_g has joined #opendev18:49
*** roman_g has quit IRC18:49
*** roman_g has joined #opendev18:50
corvusoh we didn't actually merge the tls zk change; i've approved it now, and when it goes out, i'll restart the nodepool servers18:52
*** roman_g has quit IRC18:54
fungii've only been half-around, but what's the plan with the executors?18:56
fungicontainerize and roll that into the tls switch for them?18:57
clarkbfungi: yup I think roughly if we merge corvus' change then we can stop ze02, run ansible against it which will both apply the cert stuff and switch it to a container18:58
clarkbze01 is similar process but it will just be the cert stuff and wewill need to restart it18:58
clarkbI wonder if mordreds internets are less available than expected today? but I can help with that after lunch18:59
clarkbif anyone has time for I think we can land the system-config and gerritlib changes pretty safely (we've got testing yay)19:06
clarkbthat will check off more of the todo items for doing different git branches in gerrit19:06
clarkband related to that is which is an announcement for wanting to deprecate and remove our /p/ gerrit git repo mirror19:06
clarkbthere are a couple reasons for the /p/ thing one is upstream gerrit is not using that anymore anyway and the other is its one less place to manage git branches19:07
clarkbI'm going to find lunch and will be bakc in about an hour to help with zk tls things19:08
fungii thought the problem with /p is that it's now used for something completely different in later gerrit releases19:09
fungiand so we have to stop using it for what we're using it for or it will shadow that19:09
clarkbI wasnt sureif it is being repurposed or we've been warned it may be19:10
clarkbmordred: ^ you probably know19:11
fungii think paladox was the first to mention it to me19:11
paladoxyeh, PolyGerrit takes it over for project dashboards19:11
fungiaha! project dashboards, i couldn't remember. thanks paladox!19:12
clarkbcool so itsproperly repurposed. paladox do you know which releasemakes that switch? I can update my message to be specific19:12
paladoxclarkb 2.1619:12
fungiyeah, so we're (probably) going to have to break anyone still cloning from there when we upgrade19:13
paladoxfungi we use a rewrite19:13
paladoxwhich sorts that out19:13
clarkbfungi: yup and its advantageous for us to break them earlier anyway19:13
clarkbbecause maintaining that extra mirror makes dealing with branches harder19:13
clarkbapache logs show its mostly third party ci systems and not a ton of them anyeay19:14
fungipaladox: yeah, you had the luxury that folks weren't cloning from /p but rather from /r/p which was being transparently rewritten to /p i guess?19:15
clarkbI'll update my email to better reflect the situation19:15
paladoxfungi i mean you can do /p/ -> /19:15
fungibut that will break project dashboards, right?19:16
paladoxRewriteRule ^/p/(.+)/info/(.+)$ https://<%= @host %>/$1/info/$2 [L,R=301,NE]19:16
paladoxwe don't explicitly redirect /p/* we redirect /p/*/info/*19:16
clarkbif wehad significant users I would try that19:17
paladox(which is what git uses)19:17
clarkbbut ainceit seems to be minor getting people away from non standard paths would be good19:17
fungii didn't realize git requests always include /info/19:18
clarkbalso if we drop the redirect we have now it will serve it from gerrit directly I think19:19
clarkbthen when we upgrade it will break people.19:19
clarkbSo thats an option if we want to not break people now but also stop mirroring with a warning its going away in the future19:19
fungipaladox: this is how we've been mapping those git requests so far:
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Fix GCS log upload
openstackgerritMerged opendev/system-config master: Revert "Revert "Add Zookeeper TLS support""
funginow we have to wait for deploy to complete for that ^ i guess19:46
openstackgerritMerged zuul/zuul-jobs master: Fix GCS log upload
clarkbfungi: ya and then we can restart everything but executors19:50
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add a test that exercises the GCS Credentials class
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Remove default tox_envlist: venv
*** shtepanie has joined #opendev20:01
clarkbI don't see anything in deploy, and nl01 seems to have the new zk config20:08
clarkbI think that means we could restart services now if we want?20:08
clarkblet me know if I can help /me edits email draft20:09
clarkbfungi: has minor edits now to reflect the info paladox provided. What do you think about explicitly disabling it at the end of the month? I think that way we can stop maintaining the mirror and shake out people who would break early. An alternative is to stop redirecting to our mirror and have gerrit serve it. That way we don't have to bother with the20:11
clarkbmirror but people will break post upgrade.20:11
fungii'm just about done prepping dinner, so should be able to read through it shortly20:12
openstackgerritMerged zuul/zuul-jobs master: Remove default tox_envlist: venv
corvusclarkb: i'll start with restarting nb01; slightly less disruptive20:19
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add a test that exercises the GCS Credentials class
corvusi restarted nb01; seems happy20:26
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add a test that exercises the GCS Credentials class
clarkbcorvus: it was the laucnhers in particular that had trouble last time right?20:37
clarkbI guess they will be good canaries for the fix20:37
corvusclarkb: yes i think so; i don't think the builders walk the really big part of the tree20:40
corvusclarkb: i'm going to proceed with the rest of the builders now20:40
corvusclarkb: if you want to start on launchers, i think that'd be fine, otherwise i will when i finish builders20:40
clarkbI'll start with the launchers20:41
clarkbI'll be sure to do a pull too to get the new image with the new kazoo20:41
corvusthat's not automatic?20:42
corvus(i mean, don't we run pulls in ansible or something?)20:42
clarkbwe run pulls hourly20:42
clarkbwhich now that I think about it should be fine20:42
clarkband ya it was a nop20:42
corvuscool; nb02 says 23 hours ago, that sounds right20:42
clarkbnl01 is restarted20:43
clarkbI'll watch it for a bit to ensure the read errors don't return before doing 02-0420:43
clarkbit is happy so far, cleaned up building nodes on start and is building new nodes now20:44
clarkbcorvus: 2281 is the new port right?20:46
clarkb(I'm just double checking with netstat that it is using the port we expect)20:46
clarkbno exceptions yet on nl01, I'll give it another minute then proceed20:47
corvusi finished restarting all the builders20:47
clarkb02 has been restarted now too20:48
clarkband now 03 and 04 are done20:49
clarkbwe have launch errors with limestone but those seem to be in nova apis not zk20:50
clarkbopenstack.exceptions.SDKException: Error in creating the server (no further information available)20:50
clarkbmy favorite error20:50
clarkband that is the only exception I'm seeing across the 4 launchers20:52
fungiokay, sorry, was distracted by awesome gumbo, but back and ready to help now20:58
clarkbI think nodepool is done now and my eyeballs doing pattern matchign against scrolled text haven't found anything concerning21:00
clarkbthere is a new launcher exception related to failures to delete insteances in ovh21:00
clarkbnothing zk related that I can see21:00
fungiso assuming we're cool with nodepool, zuul mergers next?21:00
clarkband/or scheduler21:01
openstackgerritMerged zuul/zuul-jobs master: Add a test that exercises the GCS Credentials class
corvusyeah, the scheduler is the only one that's actually going to use the connection at this point21:10
fungier, actually, do the mergers connect to zk currently?21:10
corvusthey'll all connect, but just sit idle21:10
fungiyeah, just what i was starting to realize, thanks21:10
fungithey're only acting on signals from gearman21:10
corvusi think we should go ahead and restart the mergers to exercise this first in the least disruptive way21:11
corvusthen we can decide how we want to do the rest (rotate the executors into containers, then do a scheduler restart at the end?  big bang all at once?)21:11
corvusi'll go ahead and do zm01 now21:11
fungisounds good, will keep an eye on its logs21:12
corvusi'm refreshing my memory on whether there's a least-disruptive way to stop a merger21:13
corvuswe should really log job completion in the mergers :/21:15
corvusoh you know what? i don't think the other nodes actually connect to zk yet21:16
corvusi have restarted zm01, and it did not complain about the config file.  that's probably actually all we need to do; i don't think we need to worry about restarting the executors for zk; we can just restart them for containerization with confidence that they aren't going to choke on the new config.21:18
corvusso really it's just the scheduler that's the last remaining important zk bit21:18
fungimakes sense21:18
clarkbsounds good21:31
clarkbmordred: if you're around today I'm able to help with executors for another couple of hours at least so let me know21:31
fungii probably can too, dinner is done and kitchen cleaned up, so mostly just catching up on stuff now and reading through the /p removal announcement21:47
clarkbunrelated, I've been looking at openstack memory usage because it came up elsewhere I couldn't help myself. I've discovered that its possible bionic era journald has a memory leak21:49
clarkbI've got tests running on focal that I should be able to use to compare21:49
clarkbI doubt it will impact us much but how fun is that, the system logger may use many memories :)21:50
fungiclarkb: proposed announcement lgtm. from an implementation detail standpoint i suppose we'll just tweak apache to 404 on /p/.* requests?21:50
clarkbfungi: yup that was what I was thinking or 403 forbidden?21:50
clarkbsince technically on older gerrit its still valid21:50
clarkbforbidden makes that distinction a bit more clear21:50
fungii agree with that logic21:51
clarkbcool I think tomorrow will be email day then I'll try to get that out as well as followup on advisory board21:51
*** DSpider has quit IRC22:17
*** xiaolin has quit IRC22:24
*** xiaolin has joined #opendev22:24
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
clarkbianw: I see you've turned off encrpytion (which is encryption at rest not over the wire) to avoid managing that additional secret? Also, have you set it up to do append only backups? I know that corvus is a fan of that and borg does support it.22:26
clarkbianw: the only other thing I was going to call out before review too closely is that there were upgrade concerns if your borg was too old. I think focal would be well new enough but xenial may not be22:27
clarkb(not sure if you plan to conver everything)22:27
ianwclarkb: yep, so yeah i did some research and borg seemed probably the best way22:27
clarkbbut overall I've been really happy with it locally and am happy to review/help if I can. I rely on the encrypted at rest feature as I backup to my brothers house offsite and love that you can fuse mount backups for easy recovery and browsing. I don't do append only though22:28
ianwin terms of versions; yes it seems it is very wise to a) pin the version you use and b) use the same version on both sides, which is why i've taken a pip install approach22:28
ianwi am not really convinced a container is a good idea for this, although i guess possible.  a) because it's whole purpose is to walk the base file system and b) you want as little between you and a successful backup as possible22:29
clarkbya I don't run it in containers locally either22:30
ianwyeah for encryption, my thinking was that we explicitly trust the remote end, i mean as much as we trust the cloud providers to run the unencrypted vm's anyway, so an extra key seems like another hurdle to restores and another thing to lose22:32
ianwbut yeah, i'll write up a proper changelog :)  i want it to run a test backup in the gate22:33
clarkbas a followon to the journald thing it seems that focal isn't much better which implies something else is going on. I suppose its possible we log so much all at once during openstack testing that we simply cause journald to bloat in memory use22:37
clarkband if disk io were quicker we'd keep that to a lower level of memory use?22:37
clarkbthat makes me more comfortable about our prod servers22:37
ianwfungi: 3 pretty straight forward ones that update some system-config testing bits if you have time :
ianwclarkb: where was the journald thing raised?22:41
clarkbianw: I was just poking at it because a discussion about openstack memory use came up in the tc channel22:42
corvusthe append-only part helps if we need forensics after a security incident; i'd still like that if we can, but i'll concede it's a secondary goal22:42
ianwahh, yeah a couple of people seem to have popped up lately asking about memory22:42
clarkband looking at jobs journald uses a fair bit of memory (more than I would have expected) and from there found a bug fixed early 2019 to correct a memory leak22:42
clarkbso wanted to compare bionic to focal results as the memory leak fix should be in focal but possibly not bionic22:43
corvusianw: are you looking at doing append-only?22:43
clarkbcorvus: it should be doable to have append only backups, but we'd need to test that as I have no existing experience with append only and bord22:43
ianwcorvus: right now i'm just looking at it doing anything at all -- but it has many flags and good documentation about things to tweak.  i'll look at append only before final review22:44
ianwthere's other flags about having --readonly / and stuff that seem good as well22:45
corvuscool :)22:45
fungiianw: 739412 needs a rebase for an updated parent22:50
ianwfungi: oh i should have cut that out ... i guess we want to go with the dns records22:52
fungiahh, yep sshfp rrs worked great in our test22:52
*** mlavalle has quit IRC22:53
*** tkajinam has joined #opendev23:02
*** tosky has quit IRC23:04
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg backups
*** zbr|ruck has quit IRC23:16
*** SotK has quit IRC23:16
clarkbianw: if you get a chance before you call it a week reviews on would be helpful (I'm trying to push on that a step at a time as I'm able)23:21
clarkbseems we get a fairly consistent set of question around it for which is shareable23:21
*** zbr|ruck has joined #opendev23:23
*** SotK has joined #opendev23:23
ianwfungi: i'm not sure i have the patience to get to a situation of dnssec working between my local unifi environment and my work vm attached to the redhat vpn23:23
ianwi think dnssec-trigger is required to be involved23:24
ianwclarkb: not really related but  /org/{org}/repos says it's deprecated in the gitea api overview23:30
ianwoh i guess it's become "orgS"23:31
clarkboh neat we can update that too23:32
clarkbits fairly well tested too if you wantto push that change23:32
ianwclarkb: for 741277 shouldn't we see the branch being set to "main" for test-repo-2 ~  ?23:39
fungiianw: funny you should mention that, i just a few moments ago upgraded glibc on my workstation and now it needs an extra option set in resolv.conf to tell it not to clear ad flags23:41
clarkbianw: it should only ve in the jeepyb change as that updates the call to set it
clarkbianw seems I did miss something
clarkbmain is used later though23:43
clarkbI'll look into ehy the gitreview push wasmissed. I grepped for master but maybe its another file23:44
openstackgerritMerged opendev/system-config master: Copy generated inventory to bridge logs
openstackgerritMerged opendev/system-config master: testinfra: silence yaml.load() warnings
openstackgerritMerged opendev/system-config master: Fix junit error, add HTML report
fungi"Starting with glibc 2.31, the DNS stub resolver does not blindly trust the AD (authenticated data) flag, indicating a DNSSEC validation. By default the name servers and the network path to them are treated as untrusted. In this mode, the AD flag is not set in queries, and it is automatically cleared in responses, indicating a lack of DNSSEC validation."23:47
fungiso apparently i now have to include "options trust-ad" in my /etc/resolv.conf23:47
clarkbyup its in the utils file and I must've grepped poorly23:48
clarkbcan `git push origin HEAD:refs/heads/master` be safely rewritten as `git push origin HEAD:remotes/origin/HEAD` ?23:52
fungihuh, glibc uses gerrit now?
clarkblooks like if we do a remote update first that may work23:55
clarkboh hrm this git init is used as our permanent cache too23:56
clarkbso we'd need to init it with the right default branch there as well anyway23:57
clarkbI was trying to simplify and not need to know what the default branch is going to be but I Think we hvae to know23:57

Generated by 2.17.2 by Marius Gedminas - find it at!