Thursday, 2022-01-20

opendevreviewIan Wienand proposed opendev/system-config master: grafana: update to oss latest release
opendevreviewZhouHeng proposed openstack/project-config master: Retire neutron-fwaas*
fungiclarkb: use ${hash} in place of all the ${commit} macros, or just in the file option?00:41
Clark[m]fungi: my change only modified the file option00:41
Clark[m]It is very proof of concept :)00:42
fungiahh, okay thanks00:42
opendevreviewJeremy Stanley proposed opendev/system-config master: Drop gitweb dependencies
opendevreviewJeremy Stanley proposed opendev/system-config master: Fix mixed spaces and hard tabs in Gerrit config
opendevreviewJeremy Stanley proposed opendev/system-config master: Use Gitea for Gerrit's code browser URLs
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Fail our Gerrit testing for an autohold
fungihopefully i'll have a held node to inspect when i wake up00:45
fungiClark[m]: it works!
opendevreviewEduardo Santos proposed openstack/diskimage-builder master: Fix openSUSE images and bump them to 15.3
*** frenzy_friday is now known as frenzyfriday|ruck03:45
*** mtreinish_ is now known as mtreinish05:13
*** ysandeep|away is now known as ysandeep05:36
*** ysandeep is now known as ysandeep|brb06:30
*** ysandeep|brb is now known as ysandeep07:14
opendevreviewGregory Thiemonge proposed openstack/project-config master: Adding nested-virt pools for centos-9-stream
chkumar|roverHello infra, is failing with mirror issue08:31
chkumar|roverStatus code: 404 for (IP:
chkumar|roverplease have a look, thanks!08:32
*** jpena|off is now known as jpena08:37
chkumar|roverIt seems all are coming from
opendevreviewchandan kumar proposed opendev/system-config master: Use facebook mirror for CentOS Stream 8
*** ysandeep is now known as ysandeep|lunch09:57
opendevreviewchandan kumar proposed opendev/system-config master: Use facebook mirror for CentOS Stream 8
*** ysandeep|lunch is now known as ysandeep10:48
*** rlandy|out is now known as rlandy|ruck11:15
*** priteau_ is now known as priteau11:16
*** dviroel__ is now known as dviroel11:25
*** frenzy_friday is now known as frenzyfriday|ruck11:42
*** bhagyashris__ is now known as bhagyashris11:49
*** ysandeep is now known as ysandeep|brb12:06
fungichkumar|rover: is a different mirror, so what do you mean?12:46
fungioh, i see, jobs running outside rax-dfw are trying to access a mirror url which can only be reached by systems located there12:48
fungilooks like whatever changed happened between the last successful run at 2022-01-20 02:05:51 and the first retry at 2022-01-20 07:59:0912:52
fungiso roughly a 6-hour window12:52
jrosseri see the same thing on openstack-ansible jobs too13:01
fungiyeah, broadening the search:
fungilooks like it's stuff using centos-8-stream13:07
fungiso maybe an image update happened in that timeframe and it's the result of something which changed in dib13:07
fungithough i think we consume dib from tagged releases, and there's not been a new release of dib in ~1.5 months13:08
jrosserthe irony is that my job failing due to this mirror thing was to help understand why /etc/ssh/sshd_config.d seemed missing on the centos-8-stream nodes13:13
fungilooks like new centos-8-stream images uploaded roughly 7-8 hours ago in these regions: iweb-mtl01, ovh-bhs1, ovh-gra1, rax-dfw13:14
fungithe images in these regions are still over a day old: airship-kna1, inmotion-iad3, limestone-regionone, rax-iad, rax-ord, vexxhost-ca-ymq-1, vexxhost-sjc113:17
fungiso if we see any failures in the latter, then it's probably not from updated centos-8-stream images in that timeframe13:17
fungiso i think that rules out it being a problem in a new image13:19
fungithere were some rpm-related changed which merged to the zuul-jobs repo recently, but hours before the problem seems to have started:
funginothing relevant changed in openstack/openstack-zuul-jobs or opendev/base-jobs recently either13:25
fungithere was a zuul restart, but it concluded at 21:48 and we had successful runs after that13:26
fricklerfungi: isn't that just what we had with cs9 a couple of days ago, syncing from a broken mirror?
fungifrickler: no, these jobs are failing trying to reach the internal address of the rax-dfw mirror from other providers13:32
fungisomething is causing us to not set the correct mirror url on the nodes13:32
fungioh, or maybe i'm misreading it13:34
fungifrickler: you're right, this one is hitting the iweb-mtl01 mirror:
fungii got tunnel-vision after chkumar|rover said "It seems all are coming from"13:35
fungithat's not true at all13:36
fungii'll check the mirror updates, maybe it's just a stale centos-8-stream mirror13:36
fungimirrors for that are up to date though:
fungibut it could be we're successfully mirroring from a stale source this time, instead of one that's refusing connections13:38
fungicurrently we mirror that from rsync://
fungiyeah, 404 for the same files there, e.g.
fungii missed chkumar|rover's proposed change, saw his comments in here and started troubleshooting13:44
* fungi sighs13:44
fungithis is what i get for trying to fix things before coffee13:45
fungii do note that the files it's complaining about don't appear on facebook's mirror either:
fungiseems to match what we're serving:
*** ysandeep|brb is now known as ysandeep13:47
fungichkumar|rover: are you sure switching to the facebook mirror will address this problem? everything there looks the same to me as what we're already serving13:50
rlandy|ruckfungi: hi chkumar|rover is out at the moment13:57
fungisame content on cern's mirror too:
rlandy|ruckfungi: chkumar|rover was debugging with amoralej in their morning14:00
fungithe repomd.xml file is identical across all of those mirrors too14:00
rlandy|ruckchecking the LP bug14:01
rlandy|ruckhmmm .. maybe the mirrors updated  - we are starting to see some jobs run - rechecking the content provider job14:04
fungirlandy|ruck: looking through our mirror-update logs, it looks like there may have been a significant mirror update at leaseweb which caused the files in question to be removed at 06:55 utc, and then replaced at 12:57 utc14:16
rlandy|ruckfungi: why all this upheaval with mirrors all of a sudden?14:17
fungigreat question. do you know anyone maintaining the primary centos mirror infrastructure? maybe there was a large file change in the past 24 hours14:18
fungifrom what i can tell looking at it seems like the indices there were changed to different ones and then changed back14:21
fungilike someone updated content and then undid/reverted it14:22
fungilikely with some number of hours delay as we're several mirrors down the chain from the primary14:23
fungiit's also possible, since we built some new centos-8-stream images during that window, that we'll need fresh images now that the mirrors are back in sync14:31
rlandy|ruckfungi: I'll ask our rdo team - they have some better connections with the centos team14:36
fungithanks, it could help confirm our hypothesis i guess14:36
rlandy|rucklooks like things are progressing now14:37
rlandy|ruckI see one successful content-provider on c814:37
fungii've gone ahead and issued a delete for the centos-8-stream image we built ~9.5 hours ago, since it only deployed to roughly a third of our providers14:38
rlandy|ruckI think chkumar|rover is betting on the fact that the facebook mirrors are more stable14:39
rlandy|ruckbut in this case14:39
rlandy|ruckit would not have helped, iiuc14:39
fungiwell, it might have helped, but maybe by happenstance if it had updated sooner than the one we're copying from currently14:39
fungithe mirror we had a problem with for stream 9 was a completely different operator than the one we're using for stream 8, and failed in a different way (it spontaneously began refusing rsync connections)14:40
fungiin that case switching mirrors was obviously necessary14:42
opendevreviewJames E. Blair proposed opendev/ master: Add google site verification
*** frenzyfriday|ruck is now known as frenzyfriday15:16
*** bhagyashris is now known as bhagyashris|ruck15:17
*** rcastillo is now known as rcastillo|rover15:18
opendevreviewMerged opendev/ master: Add google site verification
*** ysandeep is now known as ysandeep|away15:25
*** marios is now known as marios|out16:03
clarkbfungi: re gerrit gitea links, I guess the next step is to see what feedback we get on a gerrit bug and then if they are happy with my patch I can clean it up and push it?16:17
fungiyeah, the prototype seems viable to me, but if you want to check out the links in polygerrit on the held node too that's probably good16:32
clarkbfungi: ya I checked the links on the page you linked to16:33
clarkbit looks like a sha1 to me though I guess I didn't check it is the right sha116:33
fungicool, so no obvious issues16:33
clarkblet me double check the value looks correct16:33
fungiright, it does seem to match the hash on the other link which was already correct16:34
clarkbya the values appear correct to me. So ya I think we file a bug (has that happened yet?) with an indication we can write a patch and ask for clarification if adding a new value like ${hash} is their preferred method16:35
fungii did not have time to dig up my google login to file anything yet, it's been a morning full of meetings and fires16:35
clarkbno rush on my end. I'm catching up on a bunch of zuul stuff myself16:37
AlekseiPavlovHello everyone! I have question about gerrit. We make commit to openstack/cinder, but gerrit doesn't send feedback to when my build in jenkins finished successful, but send message, if build finished with fail.Why ?16:42
clarkbAlekseiPavlov: what specific feedback are you expecting? ssh event stream events?16:44
AlekseiPavlovno, just a message in Change log16:45
clarkbAlekseiPavlov: are you trying to vote Verified +1 on successful results? cinder limits which accounts can do this via the cinder-ci group,members which is currently empty16:46
clarkbyou need to avoid voting. I assume it is trying to +1 and that is why this isn't working as expected16:46
AlekseiPavlovprobably yes, i'll try to configure gerrit plugin to avoid voting16:47
AlekseiPavlovno, it didn't help 16:49
AlekseiPavlov i wanna get message like this, but about success build 16:49
clarkbAlekseiPavlov: is Jenkins configured to talk to gerrit via ssh or http?16:50
AlekseiPavlovclarkb, thanks16:50
AlekseiPavlovi fix my proble16:50
AlekseiPavlovi fix my problem16:50
clarkbonce you have identified the method you can try reproducing directly16:50
clarkbah ok. what was the issue?16:50
AlekseiPavlovi forgot delete 1 in verify  in jenkin's job configuration  for gerrit plugin )16:52
fungiokay, meetings are done. if there's no other fires i'm going to try to help figure out the new pbr test regressions16:57
clarkbfungi: were those mentioned somewhere?16:59
fungii reproduced them yesterday with 825380 and am trying to reproduce them locally now16:59
fungilooks like something has started to cause unit tests for >=py37 to fail17:00
fungisome uwsgi related tests and a couple others17:00
fungioddly py27-py36 are passing17:00
fungiso probably a change in some dep which released a new version dropping 3.6 support and the successful runs are using an older release17:01
clarkbin the case of PBR just about the only dep is setuptools17:01
fungithough also there's changes in stdlib, e.g. importlib stuff17:03
clarkbits looking for text in stdout running the package install that indicates a script was installed17:04
clarkbsetuptools latest requires python >=3.717:05
clarkb appears to be the most recent that supports python 3.617:05
clarkbthe code to emit the text that pbr is looking for is in setuptools latest and 59.6.0 so likely something more subtle17:07
clarkbunrelated, my git uses unicode symbols like → in checkout output now17:09
fungii hope there's a .gitconfig option to disable17:09
clarkbfungi: I suspect #2974 and/or #2973 based on the git log in setuptools17:16
clarkbthough I suppose it is possible the code is just not running and the issue isn't related to logging17:21
clarkbfungi: I'm fairly certain this is a logging issue beacuse the rpm_version command emits 0.0.0 which is via a print() but it doesn't emit the output17:25
clarkbthat indicates to me that the code is executing and changes to setuptools logging (as indicated with those issue/PR numbers) are likely to blame17:25
clarkbI suspect set_threshold is to blame. They are doing that to scale up distutils.log 1,2,3,4,5 values to 10,20,30,40,50 python logging values. But if the values coming in are already scaled up then you'll end up with 100,200,300,400,500 and nothing will be logged17:27
clarkbhrm except the distutils.log _log method has a guard against that and raises a value error if the level is not one of 1,2,3,4,517:31
*** jpena is now known as jpena|off17:37
clarkbok its definitely related to the setuptools version17:48
fungidowngrading setuptools fixes it?17:48
fungier, works around it i mean17:49
clarkbyes and now with hacky debugging I think something is setting the theshold to 2 not 1. INFO is 117:52
clarkboh wait INFO is 217:53
clarkbverbosity == 1 == INFO == 217:54
fungiso we need to up the verbosity?17:55
fungier, no if it's like syslog's loglevels then we need a threshold lower or equal to the message's loglevel to capture it17:56
clarkbya something is bumping up the threshold and I'm trying to understand that17:57
clarkbwe are going from a theshold of 2 to 3. Info is 2 so get silenced17:57
fungimakes sense17:57
clarkboh hrm no this is even more complicated. I think something is trying to set it to 2 but failing and so we stay at 3?17:58
fungiin pbr/
clarkbno in setuptools/distutils17:59
clarkbearly on something is calling set_threshold to increase verbosity to an INFO level. But later when we try to actually log the message the Log.threshold value is the original higher value17:59
clarkbhardcoding the default level to INFO from WARN fixes it. I feel like this is some sort of order of oeprations thing introduced by that monkey patching change18:07
fungiso the expectation is this arrived in setuptools 60.2.0 (the first release that commit appeared in)18:10
fungi"Setuptools now relies on the Python logging infrastructure to log messages. Instead of using distutils.log.*, use logging.getLogger(name).*" (from the changelog)18:11
fungiwe do a lot of "from distutils import log" in pbr18:11
clarkbdistutils.log is still the correct interface according to the legacy migration doc. I don't think the python logging stuff is fully there even though the changelog asys to use it. Also older setuptools won't have proper python logging set up18:15
fungiright, we'd have to maintain some sort of split logic to handle that18:15
clarkbI think that something is creating a new distutils.log.Log() object after verbosity is being set which causes it to fall back to its default value of WARN18:16
clarkbbut when I try to log that I get exceptions in some io writer thing because that apparently messes with stdout18:16
fungiclarkb: fwiw, using setuptools 60.1.1 doesn't fix those tests for me, or maybe doesn't fix all of them (could be we have more than one problem)18:19
fungithough it did still give me the same rpm version test error, so i think it's a regression earlier than 60.2.0 where that pr landed18:22
fungii'm trying the last 59.x.x now for comparison18:23
clarkbfungi: ya commenting out the monkey patch didn't fix it. I have confirmed that the id() of the Log object that set_threshold is called on is different than the id() of the object that is called on later18:25
clarkbthe issue definitely seems to be that logging is set up and the threshold is set, then somewhere along the line a new object is create and the updated config is no longer valid18:25
fungisetuptools 59.8.0 doesn't exhibit this, at least. i'll try to bisect18:26
fungiusing 60.0.5 next18:27
fungii have a feeling we're looking at something which changed in either 60.0.0 or 60.1.018:28
clarkb34defd4d420e31e7c4cefe3e44145c6fe8b41eca maybe18:29
clarkbhrm no that is test only18:30
fungiunfortunately my workstation is a bit underpowered/overloaded for running pbr unit tests, so i get a fair number of fixtures._fixtures.timeout.TimeoutException test failures18:31
clarkbyou can run it against a single failing test18:33
clarkbtest_custom_rpm_version_py_command is the one I've been using18:33
fungiyeah, seeing the same failures in 60.0.518:33
fungiwhat's the stestr syntax for that?18:33
clarkblooks like 60.0.0 - 60.0.3 were straight up broken in setuptools due to a missing variable. and 60.0.4 exhibits this issue18:33
clarkbfungi: I do tox -e py38 -- test_custom_rpm_version_py_command18:34
fungiokay, so it could be anywhere between working 59.8.0 and failing 60.0.418:34
fungibisecting in projects without a green trunk is... unpleasant18:35
*** amoralej is now known as amoralej|off18:35
fungimaybe backporting the fix from 60.0.4 for the missing variable to earlier releases will help me make progress18:36
fungiand could their commit messages be any more useless?18:37
fungii guess it's eba2bcd "Add support for 'platsubdir'. Fixes pypa/distutils#85."18:38
clarkbya I think that is it18:38
clarkbalso good idea18:38
fungiat least that's a very simple patch18:39
clarkb60.0.0 with that patch applied fails with this issue18:40
clarkbI think that means it is between 59.8.0 and 60.0.018:40
fungiokay, so now we're down to bisecting commits between those two tags, yeah18:40
clarkb#2896: Setuptools once again makes its local copy of distutils the default. To override, set SETUPTOOLS_USE_DISTUTILS=stdlib.18:41
clarkbI wonder if that is it. The vendored copy is just broken18:41
clarkbyes passes if I set that var18:41
clarkbso ya something about loading distutils is breaking the log level settings18:42
fungiyeah, b6fcbbd0 "Restore local distutils as the default." looks like the only substantive change there18:42
fungitesting with SETUPTOOLS_USE_DISTUTILS=stdlib and latest setuptools now18:43
fungiyep, that solves it18:44
fungijust adding it to the setenv in [testenv]18:45
fungiclarkb: does that seem like an acceptable temporary workaround?18:45
fungii'll push it up as a wip18:46
clarkbI think its probably ok, this seems to only affect logging18:48
clarkbwhcih means pbr will still work elsewhere just less verbosely18:48
fungi with an actual commit message18:51
clarkbfungi: should we file a bug against setuptools?18:51
clarkbSomething along the lines of the switch back to local vendored distutils results in a log level of WARN from distutils.log instead of INFO as was previously the default.18:52
fungilikely. you seem to have a bit better grasp on the specifics of the loglevel bit18:52
fungia minimal reproducer would likely help18:52
clarkbya thats the problem with setuptools. I have no idea how to reproduce any of this without pbr18:53
clarkbthe whole thing is a giant mess of magic and knowing how to plug in is tricky. Maybe the thing to do is fine a log message that setuptools itself is already emitting18:53
clarkbhrm no because logging seems to work there :/18:57
clarkbthis is specific to the command hooks then?18:57
fungicould be they didn't notice because they don't test them, so missed adding support18:57
clarkbya I always get lost right at this point. How does the command hook stuff get registered wtih setuptools. I think the thing that makes it confusing is we do a bunch of PBR side processing that emits it into setup()19:00
fungias soon as i get caught up on list moderating, i'll start putting together the gerrit gitweb config bug report19:02
fungiand then hopefully on to pruning backups19:03
clarkbI'm getting hopelessly lost trying to figure out how our custom command differs from the sdist command19:08
clarkbfungi: ok I can reproduce using the build_scripts command. It emits an info log level entry for chmod19:22
clarkbbut 60.0.0 doesn't do that19:22
fungiat least that gives them something to replicate the problem without risk of them just blaming pbr19:25
fungi (for the gitweb config issue)20:34
*** artom__ is now known as artom20:47
fungialso the setuptools distutils workaround for pbr needed a revision, i failed to notice that [testenv:cover] separately overrode setenv21:05
clarkbI'll rereview momentarily21:05
clarkbalso getting together for renames21:06
clarkbdo we know if there are rename changes yet?21:06
clarkbwe'll also append to that the gerrit upgrade plabn21:06
corvus#status log manually moved new rebuilds of old zuul docs into position on project.zuul afs volume21:15
opendevstatuscorvus: finished logging21:15
clarkbfungi: should we split in two or force merge it?21:16
clarkbspecifically they have added the project to the zuul projects.yaml config and the projects aren't known by zuul yet21:16
clarkbianw: I put a rough outlien of the gerrit upgrade process on feel free to update that with more info21:17
fungiclarkb: i'd be okay with trying to split it, i think previously we bypassed testing to merge those21:19
clarkbok I'll split it then21:21
opendevreviewClark Boylan proposed opendev/project-config master: Record project renames happening January 24, 2022
clarkbgmann: fungi: one thing I notice is that skyline is adding the publish to pypi jobs but I think we've established their source code repos won't work that way?21:25
clarkbI'm going to push the updates anyway, but I'm warning openstack here that those jobs may not do what you think they do21:25
ianwclarkb: thanks, looking21:25
opendevreviewClark Boylan proposed openstack/project-config master: Rename skyline/skyline* project to openstack/skyline*
opendevreviewClark Boylan proposed openstack/project-config master: Configure renamed skyline projects with openstack jobs
fungiclarkb: yeah, it's probably fine for them to have the pypi release template, with the understanding that there's no point in them making release requests for anything until they're in a state where those jobs will actually work21:44
clarkbinfra-root Luca identified an issue with gerrit that could cause runtime errors of plugins. I've tried grepping our error log for some of the strings taht show up in his tracebacks21:44
clarkbThere is an unmerged fix which I'll propose to our gerrit builds momentarily so that we are prepared wit htaht should it become necessary21:45
opendevreviewClark Boylan proposed opendev/system-config master: Fixup Gerrit deps
clarkbinfra-root ^ that is the gerrit thing. I've added links to the things and a suggested plan. TL;DR is since we don't seem to be broken right now we can wait until tomorrow when the upstream fix should be landing. Then rebuild our images. If upstream doesn't land the fixup we can alnd this patching fix21:58
clarkbOr I suppose if we discover we are broken go ahead and land one or the other depending on how far along it is22:00
corvushowdy, i would like to rolling-restart zuul soon22:02
clarkbno objections from me22:03
corvus(just waiting a few mins for promote to finish)22:03
clarkbwill you be restarting executors non gracefully again?22:03
clarkbif so I can warn the release team22:03
fungisounds good to me22:03
corvusi'm mulling on that ... maybe we have enough time to graceful them today/22:03
clarkbhard to say, our timeout is 3 hours so it may stretch at least that long?22:04
corvushrm good point22:04
corvusmaybe we should just non-graceful22:04
fungiour timeout is 3+3 hours technically, right?22:04
fungi(pre/run + post)22:04
clarkbfungi: I think post is a shorter timeout but ya its 3 hours + some value22:05
clarkbI can warn the release team22:06
fungioh, i guess at some point the post-run timeout was split out to a separate config option rather than just using the one that covers the other phases22:06
clarkbfungi: our issue is a dup of
clarkbThe debugging on that issue was a little more in depth and shows two instances of the moduel which explains why we set the threshold then it gets ignored22:23
gmannclarkb: yeah, its a TODO for them to work on. we are aware of those limitation, let22:24
gmannlet's see when they start working on those 22:24
clarkbgmann: ok just calling it out since the jobs are added22:26
corvusokay promotes are done, starting the zuul maint now22:28
corvuspulling images22:28
clarkbfungi: project-rename is the topic you expect for the rename changes right? I went ahead and set those values and also updated the wiki entry so that hopefully people see it there in addition to our docs22:29
fungiyeah, i thhink that's what our documentation says to use22:31
clarkbOk I think the rename portion of the maintenance is pretty much ready for people to look it over then.22:32
clarkb*of the maintenance etherpad22:32
corvusrestarting zuul01 scheduler22:33
*** dviroel is now known as dviroel|out22:34
corvuslooking good.  i'm going to restart zuul02 scheduler and web now.22:44
corvusthey're back up; going to restart mergers now22:56
corvusdone; now going to hard-restart executors22:57
corvus#status log performed (mostly) rolling restart of zuul on commit 548eafe0b5729e78ab4024abea98b326678d83d823:00
opendevstatuscorvus: finished logging23:00
*** rcastillo|rover is now known as rcastillo|out23:01
*** ysandeep|away is now known as ysandeep23:52

Generated by 2.17.3 by Marius Gedminas - find it at!