Thursday, 2023-04-20

ianw i seem to have stuffed up the indenting acl change
ianwsomehow i've pushed the other two patches ontop of an old version of it00:52
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acls : add NO_CODE_CHANGE
opendevreviewIan Wienand proposed openstack/project-config master: acl : remove NO_CODE_CHANGE from Allow-Post-Review
opendevreviewIan Wienand proposed openstack/project-config master: Indent Gerrit ACL options
ianwi will admit it's impossible to know what normalization rules apply ATM without just reading the source00:59
opendevreviewIan Wienand proposed openstack/project-config master: tools/ Add some human readable output
opendevreviewMerged openstack/project-config master: gerrit/acls : add NO_CODE_CHANGE
opendevreviewMerged openstack/project-config master: acl : remove NO_CODE_CHANGE from Allow-Post-Review
opendevreviewMerged openstack/diskimage-builder master: Allow custom console=tty0 argument
opendevreviewIan Wienand proposed openstack/project-config master: tools/ Add some human readable output
ianwclarkb/fungi: i'm willing to manually apply the tab indent change if you think ^ is enough to avert the problem of increasing complexity in the normalisation tool04:35
opendevreviewChing Kuo proposed opendev/system-config master: Update Hound to Use Python 3.11 Base Images
opendevreviewIan Wienand proposed opendev/ master: Add Jammy refresh NS records
opendevreviewIan Wienand proposed opendev/ master: Remove old nameservers
opendevreviewIan Wienand proposed opendev/ master: Add Jammy refresh NS records
opendevreviewIan Wienand proposed opendev/ master: Remove old nameservers
genekuoianw: clarkb: Will you recommending update all containers to python 3.10 or later in one patch review or do it in multiple patches? I'm currently doing it in multiple patches.07:01 has address has IPv6 address 2604:e100:1:0:f816:3eff:feff:bd1c07:02
ianwmnaser: ^ is there any chance we can get these ip's updated for reverse dns?07:03
ianwns04 is a jammy server in vexxhost to replace the existing ns2 running there07:04
ianwgenekuo: i would say one thing in one change.  with gerrit and zuul we've very much about stacking smaller changes ontop of each other.  it will all be combined and tested correctly07:05
ianwgenekuo: i'm looking at the screenshots from the job you can see at
ianwit does seem there's something up with the overlay 07:10
ianwmy next thought is to look at the build history for that job @
ianw is a prior run of this job.  looking at the screenshots there, it doesn't seem like it's got this overlay thing happening07:11
ianwwhich makes me a bit worried it's related to this change07:11
ianwwhich is weird because this is a go app.  but i wonder if rebuilding it something has changed upstream?07:12
genekuoIs it possible to trigger a rebuild without my changes (updating python) and see the result?07:12
genekuoIn this case we can see if there's something changed with the go app or my change actually cause the overlay issue.07:13
ianwyes, we often will push changes like this with a [dnm] prefix on the subject (do not merge) -- so do something like make a new branch and just update a comment in the dockerfile to get it to rebuild07:16
opendevreviewChing Kuo proposed opendev/system-config master: [dnm] Test Hound Build
*** amoralej|off is now known as amoralej07:19
ianwyep, just like that :)  the dnm just flags it for us to know not to bother reviewing etc, it doesn't mean anything to zuul/gerrit07:19
genekuoThanks for the tip07:21
ianwfor this, we use the python base image to generate the hound config file with jeepyb IIRC07:22
*** jpena|off is now known as jpena07:23
ianwjeepyb (the jee is suppoed to sound like "g") helps manage gerrit and projects07:23
ianwhound itself is a go app, again, iirc.  so the python update seems unlikely to be the cause of this really07:24
ianwthe other thing it could be is selenium, which we use to take the screenshots.  something going wrong there ...07:25
ianw... if you find digging into this sort of thing interesting, then you're in the right place :)07:26
ianwselenium gets started by this playbook ->
ianwwhich runs it from this role ->
ianwand the screenshosts are taken in
ianwyep that's the implementation07:32
ianwso if they just pushed a new selenium container or something, that might be it...07:33
genekuolast push 16 days ago07:33
genekuothe last job is only 7 days07:34
ianwso one way to narrow it down would be to use an older tag for the selenium container in the DNM job07:35
ianw(if other things are making us think it is a generic selenium issue)07:35
ianw is quite suspicious07:37
ianwchange Advanced options div ID07:37
ianwps. please never write one line subject-only commit messages with no context :)07:38
ianwthe thing that looks incorrectly overlayed is the advanced options07:38
genekuofrom the commit I think it's only a name change for the tag07:38
ianwagree it seems unlikely.  and also, we would have had this change in some of the prior runs, that look ok07:40
ianwthe selenium logs from the last run, that looked good, are
ianwthe "bad" log is
ianwthey both say "Started Selenium Standalone 4.8.3 (revision b19b418e60)"07:46
ianwso that actually makes me think not much as changed there ...07:46
ianwthe selenium container is running on the test codesearch node -- to see that log you need to go to "logs -> codesearch01 -> docker"07:47
genekuoI did a quick diff of both logs, I don't find any major difference besides timestamps and some IDs07:48
ianwclarkb/fungi: i think that is pretty much my final plan for the dns upgrades.  a few ??? over who control some of the domains.  other than that, i think we can get ns03/ns04 deployed, and do the domain changes after that07:49
ianwgenekuo: if we can't figure it out from probing like this, we can make the job fail in the dnm test (just put a fake assert in testinfra) and tell zuul to hold the failed node, and we can log in and probe it, etc.07:51
ianwholding nodes on job failure is something an admin needs to setup, however.07:51
genekuoI see, let's wait for the dnm one build and see if the issue is there.07:52
ianwit's a bit of a race, but you can also look at the console output of the running job (currently
ianwthat will give you the ip of the nodes being tested07:53
ianwin this case, for codesearch01.  there are a few minutes where this has deployed and is running testinfra, etc. where the host is up and you can try to connect07:54
ianwonce the job finishes, zuul is going to destroy the node, but sometimes it's enough time to get an idea of what's going on07:54
ianwi'm afraid i've got to head off, but feel free to keep sending things through on that dnm change and poking and prodding.  i know it's a little slow way to iterate, but gives the best view of what's going on07:57
genekuono problem, let me try digging through by myself07:59
ianwthanks!  appreciate you helping out! :)08:00
genekuooverlay still exist in the dnm build, so likely not related to python version update08:13
opendevreviewArtem Goncharov proposed openstack/project-config master: Add sdk and cli xxx-service-core group
opendevreviewLucas Alvares Gomes proposed openstack/project-config master: Stop using Storyboard for ovn-bgp-agent
*** amoralej is now known as amoralej|lunch11:01
opendevreviewMerged openstack/project-config master: Add sdk and cli xxx-service-core group
gtemadear infra, would you please add me as a member into following newly created gerrit groups: openstacksdk-service-core (33c03b5e000a75ffc476c51d484c5111648d399c) and 12:15
gtemapython-openstackclient-service-core (a5d3fd5d65aea9bb530989920f8a5664d5808cbc).12:15
*** amoralej|lunch is now known as amoralej12:18
opendevreviewLucas Alvares Gomes proposed openstack/project-config master: Stop using Storyboard for ovn-bgp-agent
ykarelHi is there some issue with vexxhost mirrors, seeing Could not connect to (2604:e100:1:0:f816:3eff:fe0c:e2c0). - connect (113: No route to host) Could not connect to ( - connect (113: No route to host)12:32
ykarelhappens randomly from quite some time12:35
ykarelmnaser, may be you know ^?12:36
fricklergtema: done12:50
gtemathanks a lot frickler12:51
fungiykarel: might be worth looking at the routes we recorded for the node too, maybe something is shadowing the normal default route there (e.g. stray v6 router advertisements on the network)13:10
fungiso checking against the default routes in other successful builds in that region13:13
fungiskimming the syslog from that build, i don't see any indication of route changes during the short lifetime of that node:
ykarelfungi, hmm it good looks there14:04
fungiare those the normal default gateways in that network then?14:05
funginot some other random test node that leaked an ra out into the lan and got picked up by that node?14:05
fricklerinfra-root: seems gerrit is doing some weird handling of the ptl-approved label here
fricklerwhy is that condition not marked as satisfied?14:06
ykarelfungi, yes i see same in both good and bad case 2604:e100:1::/64 dev ens3 proto kernel metric 25614:06
ykarelalso with opensearch i see all the failures are on 4 hosts14:07
fungifrickler: i guess "-label:PTL-Approved=MIN" isn't doing what we expect? looks like it wants all reviewers to not have the minimum vote (which is 0) and two of them have a 0 vote14:07
fungii guess we should have gone with max after all14:08
fricklerfungi: that's what I thought at first, too, but compare to
fungihuh, that one says it requires "label:Verified=MAX AND -label:Verified=MIN"14:09
fungino, sorry, wrong condition14:09
fungiyeah, still "14:09
fungi-label:PTL-Approved=MIN" but there is indeed someone with a 0 vote14:10
fungiand yet it considered it satisfied14:10
fricklerso maybe zuul's "PTL-Approved 0 (vote reset)" is different from not voting on that label at all?14:10
fungifrickler: i think the difference is that elodilles had added ptl-approved +1 and then removed it14:11
fungiyes, exactly, so it's seeing an explicit MIN vote rather than an implicit one14:11
fricklerbut one cannot tell the difference in the UI except by tracking the log, I guess14:12
frickleralso zuul set the 0 after elodilles review, the PTL +1 is still there, so that looks like an issue with the zuul check, too14:12
fungielodilles: commented explaining it as "note that Rafael's email address is different in gerrit and in governance projects.yaml, that's why it is not set"14:14
fungimaybe there was an address change?14:14
elodilleswhat i thought is that check-release-approval sets 0, and thus my PTL-Approved+1 is disregarded14:14
elodillesyes, probably, but that does not explain why my PTL-Approved+1 is disregarded14:15
fungilooks like the ptl's +1 happened on march 29, then the change sat untouched until a comment on april 6, so possibly the ptl's address changed sometime in the week between those events14:16
fungielodilles: i think we're seeing a vote reset to 0 counting as a MIN vote, and the condition says to block when that's the case14:17
fungiso we probably have to go with MAX instead, and give up on the future idea of having zuul and release managers leave different vote ranges (+1 vs +2)14:18
ykarelfungi, added good/bad nodes 14:20
frickler"gerrit query 878864  --all-approvals" only shows the +1 for P-A, though14:20
fungii agree. i wonder if this is gerrit inconsistently sometimes treating 0 as an implicit non-vote and sometimes as an explicit MIN vote14:22
clarkbvote reset to 0 isn't really a reset iirc. The default state is there are no votes and that is treated as 0 in the UI but really no votes have been recorded. When you vote 0 explicitly after say +/-1 that is an actual 0 vote recorded14:28
fricklerptl email seems indeed different in governance vs. gerrit14:31
elodillesclarkb: so this means, because check-release-approval once have set +1, and then reset it to 0, it remains there forever, even if we set PTL-Approved+1 by ourselves14:34
fricklermaybe you can abort/restore or push a new PS to reset14:34
clarkbelodilles: I think whoever voted 0 needs to vote +1 that supercedes the zero14:35
clarkbinfra-root I think I understand the issue causing errors with replication and it is the same issue causing us to leak edit- ref replication tasks14:36
fricklerfor zuul to vote +1 again I think the PTL would need to fix the email mismatch and then resubmit their CR+114:37
elodillesclarkb: which is in this case was the check-release-approval job (though it does not seem to appear in 'votes' but leaves the status as 'UNSETISFIED')14:38
elodillesfrickler: actually 'fix the emai' is enough then any comment / review will trigger the check-release-approval job and will set it to +114:39
fricklerah, right. but that's likely the easier part anyway ;)14:45
elodillesyes :)14:48
fungibut current situation aside, i think using a not-min condition with a range where the min value is 0 isn't going to work quite as intended14:49
clarkbyes it will be confusing at the very least15:04
fungithe original idea was that we might give zuul 0..+2 permission and release managers 0..+1 and simply require at least one positive vote to merge, but it looks like this is not the way to express it (if there even is a way)15:05
fungibasically wanting to make it easy to tell a release manager override from a zuul vote at a glance, but i expect querying based on user instead is going to need to be enough15:07
fricklercould one have "label:Verified>MIN" or is there only =/-= ?15:07
clarkbfungi: I'm not sure I understand why you would need +2 zuul or +1 human if zuul can also +115:09
clarkbits equivalent to just always +1 in that case15:09
fungihumans can't control zuul, so limiting release manager voting range to +1 means that a release manager's vote can't masquerade as zuul's vote15:10
fricklerthe idea was to be able to distinguish a "real" PTL-approved from an "release manager override" approval15:11
fungithey were looking for a way to easily identify what changes had been merged with a release manager override on that label and hoped they could do it by differentiating the vote number15:11
clarkbisn't the vote supplier the way you do that?15:11
fungiyes, i think they're going to need to do that by querying for votes left by not-zuul15:11
fungiand just use the same allowed range for both zuul and release managers15:12
fungiand then they can set the requirement to MAX instead of -MIN15:12
fungialso i'm not having any luck figuring out where in the gerrit docs the rule syntax is explained. doesn't seem to be in the access control docs15:14
clarkbits the query syntax15:15
clarkbI think the link to it from teh ACL docs15:15
fungioh, those submit rules reuse query syntax... why hadn't i noticed that?15:16
clarkbI used that to debug the case issue15:16
clarkbyou can just copy the conditions and query on them and see what gerrit thinks. It is nice for debugging15:17
fungiso yeah, seems like PTL-Approved>MIN should work15:19
fungi"Can also be used with the {MAX, MIN, ANY} label votes. All operators >, >=, <, ⇐ are supported. "15:20
clarkbthe entire motivation behind submit requirements was to shift from prolog that no one understood to a system that typical gerrit users that understand gerrit query language could implement15:21
opendevreviewJeremy Stanley proposed openstack/project-config master: PTL-Approved uses inequality rather than negation
fungielodilles: frickler: clarkb: that ^ should in theory behave as intended15:24
elodillesfungi: thanks! looks good to me, though i might not see all the details / side cases how this will behave in our way of working / dashboard15:34
fungiwell, yeah, clearly i didn't either since i was oblivious to the current issue15:36
clarkbI've asked on gerrit discord if anyone can take a look to tell me if I am on the right track and if so I can work on some patches15:57
clarkbthe error spam we get is a subset of the second issue I think. Basically I strongly suspect that what happens is we stick a replication task in waiting/ for a ref or refs. Then the plugin filters all of those refs out of the list of things to replicate due to anonymous users permissions. In that specific case it tries to rename a new file in waiting/ for the hashed content of the15:58
clarkbempty list and tha tfails because it was never created in the first place15:58
clarkbThat failure leaks the original file in waiting/ with the refs that get filtered. Then we restart gerrit and get a bunch of errors.15:59
clarkbI think my workaround change to clear out the entries on disk will only address a subset of those cases currently so we won't necessarily prevent all errorswith my workaround at this point15:59
clarkbThe other variant of this situation is when only some refs are filtered leaving some refs to be replicated. In that case  Ithink we create the new file in waiting orphaning the original but the new one is able to be moved and deleted normally16:00
opendevreviewMerged openstack/project-config master: PTL-Approved uses inequality rather than negation
corvusclarkb:  here's the updated container copy script i'm using for zuul:
clarkblooks like I can edit that to do an image at a time too which will be good for figuring it out16:10
corvusclarkb: oops paste error:
corvusi added a change_ filter so we don't copy leaked pre-promote tags16:11
corvusclarkb: also, i just temporarily added my personal account to the 'automation tools' group so i have write perms.  will remove at end of process.16:11
clarkbhrm was that necessary? I thought our admin accounts had the bits necessary too but I would have to look at the setup again to refresh my memory16:12
*** jpena is now known as jpena|off16:17
clarkbit just occurred to me that most people using the replication plugin are probably doig replication for gerrit to gerrit replicas and/or HA. In those cases they do want to replicate everything and wouldn't run into these problems.16:43
*** amoralej is now known as amoralej|off16:56
clarkbinfra-root it has been ~17 hours since new etherpad02 took over etherpad duties. What is our comfort level for cleaning up etherpad01? SHold I unwip those changes now?17:10
clarkbI've gone ahead and removed my WIP from and
fungiprobably the only thing of value on the old server is my shell history for examples of calling the etherpad api, and if memory serves i eventually documented how to do that anyway17:37
fungiyeah, has relevant examples already17:38
clarkbI was looking at clearing up a held message in service-discuss and hit f5 to refresh the page I had open from yesterday. I love that firefox warns about resending data to do that (since I had allowed a message through yesterday and didn't need to repost that). Also fungi appears to have taken care of the moderation for the newer email already17:44
clarkblong way to say browser features that keep users from doing a silly are great17:44
fungioh, yep i usually try to check them all at least once a day17:46
fungigonna disappear for a bit to grab early dinner out since the weather's nice, but will be back in an hour or so19:14
clarkbya I'm having a lazy lunch here, but the weather is not nice19:30
fungii'm back, and yeah the weather was just too hard to pass up20:31
clarkbI've just spot checked backups for etherpad02 and they look good. I was able to borg mount and head the mysql file and ls'ing the fs backups seems to show it working there too20:39
clarkbso ya I think I'm ready to land those two chagnes to cleanup etherpad01 whenever others are happy too20:39
corvusquick question on nameserver replacement -- ns03 and ns04 don't seem to be answering queries right now... what's the sequencing?20:39
corvusi ask because i'm reviewing ianw's change at which adds them to gating.dev20:40
corvusbut i think probably they should answer queries before we do that?20:40
clarkbcorvus: captures the full plan and where we are at (completed steps are struck through)20:44
clarkbI was just reviewing it again to catch up on the latest updates20:45
corvusokay, so it looks like that change is step 7, and make new servers is step 6, and we're currently at step 420:46
corvusso that change should be considered wip until we get past step 620:46
corvuscool thx20:50
fungiokay, the acl update for openstack/releases still doesn't seem to have done what i thought it would... in you can see that it thinks the "label:PTL-Approved>MIN" submit condition is not satisfied even though there is a +1 vote. does it need every vote to match that condition, not just one or more?21:00
fungiis so, then i don't think setting it to =MAX is going to work either21:01
fungior do we need to add a count condition?21:01
fungi seems to indicate that's the syntax at least21:02
fungino, oddly =MAX seems to match, at least with the search window21:04
clarkbI would use the search queries to figure that out21:04
fungicount>0 throws an error, count>=1 doesn't match for some reason21:04
fungiyeah, i should have tested it that way21:04
fungi"label:PTL-Approved>=1" also seems to work. this is a baffling parser21:06
fungi"label:PTL-Approved>0" also works but "label:PTL-Approved>MIN" doesn't even though MIN is 0 for that label21:07
fungii'll push a change in a few minutes unless someone points out i'm misreading this21:08
clarkbI think your experimenting is probably the best anyone understands it :)21:09
opendevreviewJeremy Stanley proposed openstack/project-config master: Switch PTL-Approved rule comparison from MIN to 0
fungielodilles: frickler: clarkb: ^ happy to shift approach if someone can actually figure out what's most appropriate21:38
fungibut at least that seems like it works21:38
ianwcorvus: sorry for the confusion; i meant to -2 but forgot in the process of making all the changes22:11
corvusianw: no prob, just making sure we're following the same script :)22:12
clarkbplease weigh in on if you think we're done with etherpad01 at this point :) also the dns record upate to removeetherpad01 recrods will conflict with ianw's dns zone updates for opendev.22:23
clarkbianw: also you may be interested in the two bugs I filed against the replication plugin if you haven't seen them already (links in scrollback somewhere)22:24
ianwclarkb: have we double checked the backups for etherpad02?22:24
clarkbianw: I borg mounted and ran head against the db backup and checked the fs backups using ls. They looked good to me. But having a second person check is a good idea22:25
ianwexcellent, i didn't mount but double-checked hte logs and the remote side, lgtm22:28
ianwfungi there is a specific endpoint for testing ->
corvusinfra-root: fyi we just merged a change to nodepool which will increase our log chattiness, but i think it's worth it.  just fyi.22:39
corvus change in question22:40
ianwcurl -u user:pass -X POST -H 'Content-Type: application/json; charset=UTF-8'  '' -d '{"name": "Code-Review", "submittability_expression": "label:PTL-Approved>MIN"}'22:59
ianwis the call i made.  i do not know what that had to be authenticated, it doesn't mention it in the docs22:59
fungineat. and >0 still works where >MIN does not23:00
ianwright, >0 makes it PASS23:01
ianwi guess the escaped unicode is a valid json thing23:01
ianwbut surely this is a bug?23:02
ianwi guess does *explicitly* list >, < for MAX/MIN/ANY23:09
ianwdoes NOT i mean23:09
fungihuh, interesting. i must surely be misreading "Can also be used with the {MAX, MIN, ANY} label votes. All operators >, >=, <, ⇐ are supported."23:17
clarkb not sure how I feel about this23:18
fungii guess those sentences being adjacent isn't intended to convey the meaning i took from their proximity to one another23:18
fungiclarkb: probably worth someone asking how well supported that is beyond github, or whether it's designed to create a special github ruling caste23:21
clarkbfungi: the hacker news thread alredy has a couple people calling that out23:21
clarkb does make it look like only github is currently supported23:22
clarkbI've got a family dinner tonight in not too long but I'll plan to approve tomorrow morning23:24
fungiunfortunately the cpython folk have grown substantial github tunnel vision since moving their code hosting and then their bug tracking, ci, cla management, governance, et cetera to be wholly dependent on it23:25
fungithey're also using some new github feature that provides virtual development systems so that their core devs no longer need local dev environments23:26
opendevreviewMerged openstack/project-config master: Switch PTL-Approved rule comparison from MIN to 0
fungibut yeah, i suppose the irony of calling that "trusted publishing" is sort of lost on them, since they do implicitly trust github over their own authentication systems23:29
ianwclarkb: i'd probably like to rebase the ns changes on it, so happy to merge and supervise 880087 if you like?23:31
clarkbianw: works for me23:31
clarkbI just have to pop out soon for family time23:31
ianwsounds good to me, i'd like to try and get the new nameservers responding today, since we seem ok with the plan23:35
opendevreviewMerged opendev/system-config master: Remove etherpad01 from inventory
clarkbyup the plan lgtm I left a couple more notes on the registrar and glue record stuff23:47
clarkbbasically I don't think glue records hurt23:47
ianwi think the only thing is it means you can't replace the nameservers without updating those glue records23:52
ianwbut i mean, that's why they're called glue records, heh :)23:52
ianwin the situation where we're rolling to new hosts with new ip's, it's much of a muchness23:53
ianw is another change to clarify the search situation.  even if nobody reviews it at least it's there23:55

Generated by 2.17.3 by Marius Gedminas - find it at!