Wednesday, 2021-08-11

*** cloudnull7 is now known as cloudnull00:23
opendevreviewIan Wienand proposed opendev/system-config master: gerrit docs : remove old database docs/update duplicate account info
ianwclarkb / fungi: ^ I think that covers what I just did01:22
*** ykarel|away is now known as ykarel04:46
opendevreviewMerged openstack/diskimage-builder master: Introduce openEuler distro
*** marios is now known as marios|ruck05:11
*** bhagyashris_ is now known as bhagyashris05:39
*** rpittau|afk is now known as rpittau07:23
*** jpena|off is now known as jpena07:32
yoctozeptomorning infra; related question to one of my recent ones - could we configure gerrit to allow the (new, gerrit's own) wip flag to be lifted by project cores?07:47
opendevreviewMerged opendev/elastic-recheck rdo: Run elastic-recheck in container
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
opendevreviewMerged openstack/project-config master: afsdocs_secret-tox-docs-site: Zuul 4.6.0 fix
opendevreviewAndreas Jaeger proposed openstack/project-config master: Adjust secrets for developer.o.o
*** rpittau is now known as rpittau|afk11:28
*** dviroel|out is now known as dviroel11:32
*** jpena is now known as jpena|lunch11:35
*** jpena|lunch is now known as jpena|off12:28
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
*** ykarel is now known as ykarel|away13:23
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
tristanCcorvus: clarkb: it seems like is blocked from the internet, can we open the port (and how?). Otherwise could you please paste the output of `curl`13:56
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
fungiyoctozepto: that would have to be configured on a per-project basis. if you can find the right access control in the gerrit docs for it, propose a change and we'll review and try it out14:07
yoctozeptofungi: ack, I hoped someone tried that already :-) I will try it when I need it again and have time to research gerrit :D14:08
corvustristanC: looking14:19
corvustristanC: was thisk working earlier?  (i thought this was how you discovered the connection issues)14:22
fungi#status log Killed an htcacheclean process on which had been squatting the flock since 2021-07-21, and then cleanly restarted the apache2 service since at least one of its workers logged a segfault in dmesg at 10:18:55 UTC when its cache volume filled completely14:24
tristanCcorvus: it was not enabled in opendev (though i used the endpoint to diagnose the issue locally)14:24
opendevstatusfungi: finished logging14:24
corvustristanC: oh, so that graph was from a locally running copy?14:24
fungigiven the timing, i suspect the hung htcacheclean on the bhs1 mirror was related to our afs server restarts14:25
corvusfor the issue at hand -- it looks like the base playbook has exited with an error, however, the logs indicate that it did write the iptables config on eavesdrop01.  but the file on disk does not have 9001 in it.  so it's looking like somehow the change to add the port isn't working as expected.14:26
tristanCcorvus: yes14:27
opendevreviewJames E. Blair proposed opendev/system-config master: Test port 9001 on eavesdrop
corvustristanC: ^ i will be curious what the results of those tests are14:28
corvustristanC: meanwhile, i'll try to make a small playbook to manually run to help debug14:29
opendevreviewAnanya proposed opendev/elastic-recheck rdo: Make elastic recheck compatible with rdo elasticsearch
opendevreviewJames E. Blair proposed opendev/system-config master: Remove eavesdrop from webservers group
corvustristanC: ^ i suspect that's the problem.  if it is, then 804247 will fail tests and 804255 will pass.  if that happens, we can squash them.14:48
tristanCcorvus: nice, thank you for looking into that!14:48
opendevreviewJames E. Blair proposed opendev/system-config master: Remove port 22 from webservers extra ports
corvusand that's unrelated cleanup ^14:49
*** jpena|off is now known as jpena14:57
clarkbyoctozepto: fungi: there is an acl for it, but unfortunately the gerrit docs don't tell you what the raw contents are just describe the high level objects since they expect people to edit them via the web ui15:03
clarkbyoctozepto: fungi: I think you can create a change in the web ui and view the diff of that then abandon the change in the web ui and propose it to project-config though15:03
yoctozeptoclarkb: I don't know about the web ui :-(15:05
clarkbyoctozepto: if you view the acls for a repo there should be an edit button that allows you to make changes with a gui then propose them for review. We don't use that process but if you do that you get diffs you can apply to project-config acl files iirc15:05
clarkbcorvus: do you need to squash the test and group changes together in order for the test to pass?15:06
corvusclarkb: the group is a followup so it should pass.  we will need to squash to merge (but i want to see the test fail first)15:08
clarkbgot it15:08
corvushrm, the job failed, but it failed restarting apache?15:12
corvusi've rechecked, but i didn't expect or understand that failure15:14
yoctozeptoclarkb: I seem unable to do anything on,access it's read-only15:14
clarkbyoctozepto: that must be a change since we upgraded gerrit. That is unfortunate. In that case we probably have to go read some gerrit source instead15:15
clarkbyoctozepto: is the documentation fwiw15:15
yoctozeptoclarkb: yeah, but it gives no clues15:16
yoctozeptodon't worry though, it's no priority for us at the moment ;-)15:16
clarkbyoctozepto: public static final String TOGGLE_WORK_IN_PROGRESS_STATE = "toggleWipState";15:16
clarkbyoctozepto: I have a checkout so grepping was easy :) I think that is the string you use to indicate the permission you want15:17
clarkbthen it is like create, push, etc15:17
yoctozeptoclarkb: ack, many thanks; then I will propose a change later15:19
yoctozeptonow in a meeting15:19
*** sshnaidm is now known as sshnaidm|afk15:35
*** jpena is now known as jpena|off15:42
clarkbanyone know why zuul didn't run jobs against ? I'm going to recheck it. Its jobs are defined in openstack/project-config16:11
clarkbrechecking it does seem to have queued the docs job16:14
*** dviroel is now known as dviroel|away16:19
fungicould have been pushed during a zuul restart i guess?16:27
fungimmm, no there was no restart in progress at that time16:28
*** ykarel is now known as ykarel|away16:33
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Find (s)testr more reliably
roman_gHello team. CityCloud reports that they are experiencing ongoing issues in KAN1 region, their Glance API is crashing and needs to be restarted multiple times during last hours. Their operations team says that this is somehow connected to the operations being done under opendevtest account. Account has created 391 images today, and that might be the reason of glance failures, which causes problems to City Networks.17:09
roman_gPlease, check what is going on there. Thank you.17:09
clarkbroman_g: nodepool will restart its uploads if previous ones fail which could cause a feedback loop17:10
clarkbwe can disable the cloud region and stop uploads but I strongly suspect this isn't the cause of the problem17:10
clarkbI'm not sure what there is to check on our end other than it is likely failing in a loop17:11
roman_gAre there any more details available to you on why and on which operations their Glance is failing?17:11
clarkbroman_g: the cloud should have that information availale on their end... but we can look17:11
roman_gThank you. I will forward information that we are in a loop to them.17:12
clarkbroman_g: openstack clients typically et very terse error messages and to do any real debugging you hvae to look at hte cloud side fwiw17:14
roman_gOK. Requesting.17:14
roman_gThank you, clarkb.17:14
clarkbroman_g: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to ('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))17:15
clarkb"Broken Pipe"17:15
roman_gInteresting. Thank you.17:15
opendevreviewClark Boylan proposed openstack/project-config master: Disable airship citylcoud nodepool provider
clarkbinfra-root ^ is that the correct way to stop image uploads?17:19
roman_gEndpoint responds appropriately, but I don't have creds to test, so getting http 401 not authorized.17:20
clarkbroman_g: the way that nodepool works for image is it builds a new image for eahc image type daily then will upload it to all he clouds in a loop17:20
clarkbSo ya us creating a bunch of images is expected if image creations fail, but I dout that we are the source ofthe problem. Possibly amplfying it or making it more visible17:20
corvusclarkb: lgtm17:21
fungiclarkb: we should be able to interactively pause?17:21
fungithough maybe that only pauses builds not uploads17:22
fungiroman_g: based on the error, it looks like we get partway through uploading the image and then the connection is terminated17:22
clarkbfungi: is that documented? I'm not finding how that works with grepping pause in the docs17:23
mordredwe've seen issues like that with clouds before due to unhappy load balancers - the lb will terminate on a timeout, and then we'll happily retry the upload which will still trigger the timeout - enter an endless loop17:23
fungithough "partway" may be as few as 0 bytes, i can't really tell17:23
fungiclarkb: doesn't seem to have included docs, just usage output for --hel17:25
fungiand i guess it's for pausing an image not just a provider17:26
clarkboh ya we want to do a specific provider in this case17:26
fungiyep, so what you have is what we want i think17:27
clarkbI'll go ahead an manually modify nb01 and nb02 as well now that the change is approved17:28
opendevreviewRadosÅ‚aw Piliszek proposed openstack/project-config master: Allow kolla cores to toggle kolla wipstate
clarkbalright it has been manually applied on the two builders now17:32
clarkbroman_g: ^ should stop retrying for a bit though ansible may put the old config back again depending on when the change above merges and when our hourly updates happen17:33
roman_gclarkb thank you. I'm waiting for reply from provider. See you tomorrow.17:34
roman_gI will read evening logs.17:34
yoctozeptoclarkb: thanks again and here is the patch to merge:
yoctozeptoit seems ooo already used it17:35
fungiyoctozepto: oh! good, so you can ignore my comment about it probably causing a failure in the validator, we must have already solved it when tripleo decided to give it a try17:37
yoctozeptofungi: yeah, I replied on the review as well17:38
yoctozeptoI'm glad it went this smooth again17:39
opendevreviewMerged openstack/project-config master: Disable airship citylcoud nodepool provider
yoctozeptothanks mordred17:42
* mordred has been useful for the week17:42
fungiand it's only wednesday!17:43
jrosseris there a fix for git review -> error: remote unpack failed: error Missing tree 3e154d5146909cb52cf17b71f8a6630448aab48517:55
opendevreviewMerged openstack/project-config master: Allow kolla cores to toggle kolla wipstate
clarkbjrosser: sort of. The issue is in jgit not git review17:56
jrosseri've had this a couple of times today17:56
clarkbjrosser: latest git review has a --no-thin flag which can be used when you have that problem to workaround it17:56
clarkbthe flag shouldn't be used always as it is far more computationally intensive but when you hit this issue can be supplied17:56
jrosseronce after trying to propose a revert of a patch straight after proposing it17:57
fungiand that needs git-review 2.1.0 just fyi17:57
jrosserand just now i did a trivial fix to the commit message with commit --amend17:57
clarkbjrosser: yes reverts are apparently involved in one of the known reproduction cases, but upstream closed the bug and told me to go away when I asked them to reopening it :/17:57
jrosserdoh :(17:57
clarkbjrosser: is the bug, the issue is in jgit. git review is an innocent bystander17:58
clarkbbut ya try the --no-thin flag17:58
jrosseroh thats awesome, --no-thin worked straight away17:58
fungiwhich is essentially a passthrough to git push's --no-thin option17:58
clarkbjrosser: is a recent thread on it if you want to read up on what I said to upstream17:59
clarkbthere is another workaround we can do but it presents a security issue so we don't17:59
fungii love that they closed the bug report because it "looks pretty old" and even though the message invited people to report if they could reproduce it on recent gerrit versions they told you privately that they didn't care?18:01
clarkbfungi: no they told me in the thread Ilinked above18:02
clarkbbasically because I don't have a set of reproduction steps the bug isn't worth keeping open18:03
clarkbI responded with basically "I understand that not having a simple reproduction case makes fixing this harder but the problem seems understood by the JGit maintainers nad it still happens with our 3.2 gerrit"18:04
clarkbbasically they set the bar so high that fixing this (what I would consider) major bug isn't important enough18:04
clarkbI asked becaues I couldn't reopen the bug myself fwiw18:06
fungiclarkb: interesting, that bug's comments skip from 2017 to 2021 so i don't see any from you18:06
clarkbfungi: right it is on the mailing list thread18:06
clarkbI didn't bothcommenting on the bug bceause it is closed and I can't reopenin it18:06
clarkbwow typing is hard. I didn't bother commenting on the bug because it is closed and I can't reopen it18:06
fungii see now, i missed your link to the ml18:07
fungii guess if we could capture an exact remote and local repository state which exhibit the issue, then we might be able to reproduce it with those?18:08
clarkbya or go through the steps to reproduce in the original bug and see if they reproduce on modern gerrit18:09
fungiwhich means we'd need to tell someone to not work around it, tar up their local repository, snapshot the bare repo on the gerrit side, and then load those up in a test deployment18:09
clarkbI personally don't feel like users should be asked to go through that effort18:10
clarkbif you read the thread there is clear indication that jgit understands the exact problem18:10
clarkbgerrit wants to pretend it doesn't happen anymore and force end users to jump over a very high bar18:11
clarkbthey even fixed it once but reverted because it created performance regressions18:11
*** dviroel|away is now known as dviroel19:09
opendevreviewMerged zuul/zuul-jobs master: Find (s)testr more reliably
smcginnisI'm seeing something odd with gerrit queries, wondering if someone might know why.19:54
fungiplease elaborate!19:54
smcginnisIf I get the changes from the API using the json returned only has a small amount only against opendev/ci-sandbox.19:55
smcginnisBut if I query in the UI with, it returns a bunch of reviews.19:55
smcginnisCan't see why they would be different.19:55
fungithose where they voted vs those where they only commented?19:58
fungiit probably helps to add options to expand the response so it contains the comments, i'll check19:58
smcginnisAh, wasn't thinking reviewer vs commentby.19:59
smcginnisThey do show up in the "CC" group now, not in "Reviewers" in the UI.20:00
clarkbreviewer is changes that have been or need to be reviewed by a user and I think gerrit does define that as voting20:00
fungi"REVIEWER: Users with at least one non-zero vote on the change."20:03
smcginnisWeird that this lastcomment-scoreboard code works for most CI's. None of them are voting.20:03
fungi"CC: Users that were added to the change, but have not voted."20:04
smcginnisI'll see if it works with querying by commentby instead.20:05
fungii agree does a poor job of distinguishing those20:06
clarkbthere is also reviewedby20:07
fungiyeah, which is subtly different apparently20:07
fungialso i find the lack of alpha-ordering for entries there... maddening20:08
smcginnisLooks like in this case it is equivalent since I have the (theoretical) list of CI IDs, so the script just needs to get a set of recent reviews to inspect.20:08
smcginnisI'll run it with both and compare output to see if there's any difference.20:08
smcginnisNo comments found. CI SYSTEM UNKNOWN  - I'll do some more debugging, but doesn't look like changing that filter to either of those worked.20:31
clarkbsmcginnis: curl -X GET returns a bunch of results for me and seems to match what I get in the dashboard20:34
clarkbcommentby seems to do similar as well20:34
clarkband reviewer. So the UI and API seems to line up20:35
*** dviroel is now known as dviroel|ruck20:35
smcginnisThe script has spaces in the URL. I wonder if something is causing an issue with that. Still not sure why it is only for this CI and all the other accounts come through fine.20:36
fungiwhen looking at the changes where that account is listed as a reviewer, it does seem to have left a vote at some point on at least one patchset20:38
smcginnisOK, debugged some more and it looks like it is actually failing somewhere in parsing the response. Which then causes a failure that emits CI SYSTEM UNKNOWN, which in this case is a little misleading.20:58
smcginnisThanks for the pointers!20:58
smcginnisIf curious, this is what actually gets called: curl -X GET ""20:58
fungibut only for that account?21:00
smcginnisYeah. There are a few others failing, but they could be really old deactivated accounts. Most are working fine, this is the only one failing in this way that I've identified so far.21:05
smcginnisAppears this does not return anything (empty iterator):
clarkbI deactivated a mellanox cinder ci account recently21:05
clarkbI havne't heard any screaming about it since I did it and from what I could tell it hadn't been used in a long time21:06
smcginnisLooks like the last run must have worked for some mellanox account -
smcginnisThe failing ones in there are mostly expected.21:06
clarkbya I think they consolidated accounts21:06
smcginnisAt least so far, other than the pure one.21:06
clarkbthere was a cinder specific one and now I htink they use a generic ci account21:07
corvustristanC, clarkb, ianw: finally failed on the actual issue it should have failed on (attempt #3).  attempts 1 and 2 failed on an apache restart issue related to the limnoria handler.  i don't understand that.  but that also took out the follow up change which should have working tests.21:17
corvusin other words, i think there's a flaw in the eavesdrop playbook which has a 75% chance of causing test failures, but i don't understand it.21:18
corvussee for an example failure21:18
corvusit doesn't look like the jobs save the journal, so we can't see the error21:19
clarkbthat is running as an ansible handler in response to writing out the apache config for the apache server21:20
corvusi guess the limnoria role writes out that config?21:21
opendevreviewJames E. Blair proposed opendev/system-config master: Test port 9001 on eavesdrop
opendevreviewJames E. Blair proposed opendev/system-config master: Remove port 22 from webservers extra ports
clarkbinternet says this can happen if the ports already have things listening on them21:21
clarkbthat seems unlikely here unless it is the old apache process and not stopping reliably in the restart21:22
clarkbthe other possibility is that the new vhost template is invalid for some reason21:24
clarkbmaybe the ssl cert provisioning didn't succeed earlier with the mocked out letsencrypt?21:24
corvusclarkb: yes:
clarkbya I was looking at the ara side and it almost looks like we're not calling the script with the test flag? But I'm reading what the script does now.
clarkbLETSENCRYPT_STAGING env var is how we toggle the staging flag21:28
corvusit's hitting the staging server, but according to the curl output, there's an ssl handshake error with the acme server21:29
clarkboh I see21:29
clarkbso we are staging but we are failing21:29
corvusthat's my understanding; like maybe network error or LE server issues?21:30
clarkbya could be21:30
clarkbthat could also explain why it is fine now. They fixed the upstream issue21:30
corvustrue, success was the latest run21:30
corvuscloud provider does not correlate21:31
corvusokay, i think we should assume it's LE or Internet and see what the next run comes back with21:32
corvusclarkb: thanks :)21:32
clarkblooking through my old changes is one that should be easy to land. We'll just want to confirm the results the next time we boot an instance21:42
*** dviroel|ruck is now known as dviroel|ruck|out21:46
opendevreviewClark Boylan proposed opendev/system-config master: DNM force gitea failure for interaction
clarkbfungi: I have just discovered is a change I pushed to upate rename repos after the server upgrade. Notice it removes the mysql steps (sorry I wish I remembered I had pushed this change) but it also does reindexes for groups and projects21:56
clarkbit seems this wasn't strictly necessary after the recent rename we did, but I suspect we do need the groups reindex at least if groups change names21:56
clarkbalso the projects reindex is needed if acls change maybe?21:56
clarkbI'm going to rebase that so that it is mergeable and I suspect we may want to merge it21:57
opendevreviewClark Boylan proposed opendev/system-config master: Add additional post project rename reindexing
clarkbtrying to do some spring cleaning at the end of summer :)22:00
ianwclarkb: just while it's in my mind, would be good if you could read through
ianwupdates docs for removing user emails22:24
clarkbianw: ++ I meant to do that then it slipped my mind. Will do so now to prevent that happening again22:24
ianwno rush; just the relational model between bits tends to fall out of my head fairly quickly :)22:25
clarkbianw: left a few notes. I'm not sure any one rise to a -1 but together it is probably worht an update22:29
ianwso after it's fixed, you can point the externalid at their old account?22:33
clarkbbasically we put the file back in place for the openid then chnage the accoundId in the file to be the old account instead of the new22:35
clarkbcurrently gerrit rejects that push22:35
clarkbbut it should accept them once the conflicts are all removed22:35
opendevreviewIan Wienand proposed opendev/system-config master: gerrit docs : remove old database docs/update duplicate account info
clarkbfungi: ^ did you want to look that over since you did one recently too?22:39
clarkbif not I think we can approve that22:39
fungisure, just a sec22:42
clarkb we may find that interesting, also possible we might learn something from it too22:49
fungiianw: some comments, i'm not opposed, but am mostly concerned about the bits where it's authenticating with you.admin through the rest api22:51
ianwargh, more stable branch debian-stable references ... openstack-ansible-nspawn-container-create-debian-stable23:13
clarkboh ya that reminds me there are a bunch of x/tap-as-a-service zuul config errors in the openstack tenant23:14
clarkbI saw them when trying to sort out why my infra-specs change didn't run jobs23:14
clarkbI know we've said we won't care too much, but its hard to not notice and want to go force merge a bunch of fixes :/23:14
fungisupposedly at least some of those were going away with branch eols/deletions23:14
opendevreviewMerged opendev/system-config master: gerrit docs : remove old database docs/update duplicate account info
ianwi think codesearch got branch overrides, so we could setup some separate instances to index stable branches23:24
ianwlike codesearch/wallaby/...23:24

Generated by 2.17.2 by Marius Gedminas - find it at!