*** ysandeep is now known as ysandeep|out | 00:59 | |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 01:16 |
---|---|---|
opendevreview | wangxiyuan proposed openstack/project-config master: Correct openEuler mirror URL https://review.opendev.org/c/openstack/project-config/+/834086 | 01:35 |
fungi | clarkb: yeah, 779546 is no longer needed. i've abandoned it | 01:57 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 02:22 |
fungi | i've got the backup pruning in progress in a root screen session | 02:49 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Stop checking the OpenStackID HTTPS cert https://review.opendev.org/c/opendev/system-config/+/834094 | 02:53 |
ianw | clarkb/fungi: https://review.opendev.org/q/topic:opendev-gerrit-retire was what i came up with to retire this repo. unlike other repos where we're happy for the historical changesets to show up forever more, this one seems different so that's why i put the robots block in there | 02:55 |
ianw | s/this repo/the opendev/gerrit repo/ | 02:55 |
fungi | we can also delete its branches prior to retirement if we prefer | 03:00 |
ianw | yeah, i mean those branches though have some of our history? | 03:01 |
ianw | maybe nothing worth saving, i guess | 03:01 |
fungi | any history which was done via changes is still reachable by named refs which won't be garbage collected | 03:02 |
opendevreview | Merged zuul/zuul-jobs master: encrypt-file: always import expiring keys https://review.opendev.org/c/zuul/zuul-jobs/+/829853 | 03:15 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 03:30 |
mnaser | i've got a nodeless job that is hitting RETRY_LIMIT -- https://paste.opendev.org/show/bz2Fb6ho3KnJn5bz9q4s/ i'm failing in a "clean" way here (the change in question -- https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/834092 ) | 03:33 |
mnaser | ok, my zuul_return didn't have a |list after teh selectattr, maybe that's borking it | 03:35 |
mnaser | nope, not that.. | 03:36 |
mnaser | i'd appreciate if any admins have any insight in zuul logs on why it's doing that ;( | 03:39 |
ianw | mnaser: one sec and i'll see if i can find something | 03:43 |
mnaser | ianw: thanks! | 03:45 |
ianw | ok, https://zuul.opendev.org/t/vexxhost/build/ebcb981c62aa4d8d9f78b5a1112c99a4 just failed like that | 03:45 |
mnaser | yep, the logs i got in teh paste is because i had the console open | 03:45 |
mnaser | i wonder if my zuul_return is messing it up (i havent tried to comment it out to test my theory) | 03:46 |
ianw | trying to figure out what executor it went to | 03:49 |
mnaser | ianw: the one i had logs for was `ze08.opendev.org` with event id `44d00029e9ea4e6c89a598b4c2dfd4b8` (thanks "Print job information") | 03:50 |
opendevreview | Merged openstack/diskimage-builder master: Use https for downloading ubuntu images https://review.opendev.org/c/openstack/diskimage-builder/+/833997 | 03:51 |
ianw | that one seems like it was canceled | 03:55 |
mnaser | hmm, maybe since i pushed a new change | 03:57 |
mnaser | i wonder if calling `fail` inside `run` playbook would cause it to mark the job to be retried? | 03:57 |
ianw | mnaser: https://paste.opendev.org/show/b8zzsstaHo8qsAgJzUKX/ | 04:00 |
ianw | that's what's failing | 04:00 |
mnaser | so using the `fail` module inside a run playbook causes it to fail | 04:00 |
mnaser | hrm | 04:00 |
mnaser | so how can i fail a job 'on purpose' after i run teh zuul_return | 04:01 |
* mnaser thinks of tox role | 04:01 | |
mnaser | ok so it is using block + always -- https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/tox/tasks/main.yaml | 04:02 |
mnaser | Ok, i'll try the same concept at `tox` | 04:03 |
ianw | it will retry in pre-run, but not run? | 04:04 |
mnaser | ianw: but the jobs are failing in `run` and not `pre-run`, that's the weird part | 04:04 |
mnaser | i wonder if failures on `localhost` get retried, this is a nodeless job | 04:05 |
ianw | 4a609a2e704b43a39127240d3f93c72a was one of hte retried one | 04:12 |
mnaser | i wonder if `fail` or `asssert` fails in a different way with ansible | 04:13 |
ianw | that went to ze06 | 04:15 |
ianw | i think the non-prefixing of exception lines has got me again :/ i feel like we've fixed that | 04:16 |
mnaser | ianw: i guess i could try removing the zuul_return part if you're not finding much more, but i'll wait before clobbering your logs more | 04:16 |
ianw | mnaser: https://paste.opendev.org/show/be7DwbeG4vRIVlrbkyEb/ seems to be the problem | 04:16 |
mnaser | interestinggg | 04:16 |
ianw | ahh, and you're sticking things in data: zuul: warnings: so i think maybe a smoking gun here :) | 04:19 |
mnaser | yes, i forgot map(attribute='msg') | 04:20 |
ianw | i'm sure i did something recently that returned exceptions like this a little better, at least as output in the build | 04:23 |
ianw | https://review.opendev.org/c/zuul/zuul/+/829617 is what i'm thinking of | 04:24 |
ianw | might need a similar catch around this bit | 04:25 |
mnaser | ok finally, was able to fix it | 04:25 |
mnaser | now ill just wait this to fail to see the comments show up | 04:26 |
mnaser | thanks for that little help ianw | 04:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 04:28 |
ianw | np :) | 04:28 |
*** ysandeep|out is now known as ysandeep | 05:05 | |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 05:34 |
*** ysandeep is now known as ysandeep|brb | 05:54 | |
*** ysandeep|brb is now known as ysandeep | 06:10 | |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 06:38 |
opendevreview | wangxiyuan proposed openstack/diskimage-builder master: Enable Yum mirror for openEuler element https://review.opendev.org/c/openstack/diskimage-builder/+/833969 | 07:13 |
opendevreview | wangxiyuan proposed openstack/diskimage-builder master: Enable Yum mirror for openEuler element https://review.opendev.org/c/openstack/diskimage-builder/+/833969 | 07:13 |
opendevreview | Merged opendev/system-config master: Stop checking the OpenStackID HTTPS cert https://review.opendev.org/c/opendev/system-config/+/834094 | 07:31 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 07:41 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 07:45 |
*** arxcruz|off is now known as arxcruz | 07:50 | |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 07:51 |
frickler | ianw: could you have a look at https://review.opendev.org/c/zuul/nodepool/+/826541 please? I've fixed the issue with sdk 1.0, but maybe it should be split off from the revert into its own patch? | 07:54 |
ianw | frickler: yeah, i'd probably agree in splitting out the fix | 07:56 |
ianw | is that fixing it, or skipping over the quota calculation if we don't get the details? | 07:58 |
frickler | ianw: both. the fix is to not check for the id attribute but for vcpus, because with latest sdk we get a Flavor() object, which does have an id, but not the one we expect | 08:12 |
frickler | I then also added the skip so we don't fail if we still don't have useful flavor data | 08:12 |
ianw | ok, i'm not sure if we should skip -- if we were skipping we wouldn't have noticed something like this? | 08:14 |
ianw | definitely think it should be it's own change with ^^ in the commit message :) | 08:15 |
frickler | yeah, I'll do that and try to amend to comments so things get clearer | 08:16 |
*** ysandeep is now known as ysandeep|lunch | 08:33 | |
*** jpena|off is now known as jpena | 08:36 | |
elodilles | clarkb fungi : no problem, i forgot to abandon it myself, sorry :S | 08:40 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 09:14 |
*** ysandeep|lunch is now known as ysandeep | 09:27 | |
*** pojadhav is now known as pojadhav|lunch | 09:36 | |
*** pojadhav|lunch is now known as pojadhav| | 10:12 | |
*** pojadhav| is now known as pojadhav | 10:13 | |
*** lajoskatona_ is now known as lajoskatona | 10:31 | |
priteau | Hello. Is opendev git having issues? I am getting very odd job failures | 10:32 |
priteau | error: Server does not allow request for unadvertised object ffe7c0baf1eb21f7b0c3eabe480348b376cf9286 | 10:32 |
priteau | warning: Clone succeeded, but checkout failed. | 10:32 |
priteau | https://zuul.opendev.org/t/openstack/build/39cdb2cfad2848faa95de52afbd910c5 | 10:50 |
priteau | And in another job: | 10:50 |
priteau | Running command git clone --filter=blob:none --quiet https://opendev.org/openstack/nova /home/zuul/src/opendev.org/openstack/blazar-nova/.tox/pep8/src/nova | 10:50 |
priteau | fatal: the remote end hung up unexpectedly | 10:50 |
priteau | fatal: protocol error: bad pack header | 10:50 |
priteau | warning: https unexpectedly said: '0000' | 10:50 |
priteau | warning: Clone succeeded, but checkout failed. | 10:50 |
priteau | https://zuul.opendev.org/t/openstack/build/a15cabe7fdd5441185844f996120b9c4 | 10:50 |
*** ysandeep is now known as ysandeep|afk | 11:04 | |
*** dviroel|ruck|afk is now known as dviroel|ruck | 11:26 | |
fungi | priteau: it's possible yesterday's gitea upgrade introduced a regression, or that we've got a corrupt object in the nova repo on one of the servers in the cluster, but also why are you cloning nova over the network in a job? that's highly inefficient, and why we cache those git repositories locally on all our job nodes | 11:27 |
fungi | i'm testing cloning nova, but it's taking a while... longer than i would expect. could be something on my end, could be something to do with the specific server my request got routed to, could be something related to the upgrade, could also just be my pre-caffeine imagination running wild... not sure yet | 11:43 |
*** iurygregory_ is now known as iurygregory | 11:45 | |
frickler | fungi: priteau: this code still references zuul-cloner and hasn't been touched in 3 years, likely could use some update https://opendev.org/openstack/blazar-nova/src/branch/stable/yoga/tools/tox_install.sh | 11:54 |
fungi | i'm surprised we haven't ripped out zuul-cloner already, that's a leftover from zuul v2 | 11:56 |
fungi | anyway, i've finished looking through the resource graphs for the load balancer and all the git servers. other than a bit of cache memory pressure i don't see anything especially alarming | 11:57 |
fungi | my clone of openstack/nova just now completed, so took roughly 15 minutes | 11:58 |
fungi | no errors though. i'll try cloning directly from each of the backends next | 11:59 |
fungi | though i expect it to take a while at this speed | 11:59 |
fungi | another thought, i wonder if something changed with the git protocol supported by the servers in the upgrade, which is impacting whatever older version of git is on those bionic nodes | 12:00 |
fungi | as an aside, why would an openstack project be running stable/yoga branch jobs on bionic? openstack switched to focal several releases ago | 12:00 |
priteau | I didn't realise we were running on bionic | 12:02 |
priteau | we are using the standard job templates | 12:02 |
frickler | seems the lower-constraints template could be updated | 12:07 |
priteau | fungi: the openstack-tox-py39 job runs on focal and has the same issue | 12:08 |
fungi | thanks, so it's probably nothing to do with the version of the git client | 12:08 |
fungi | i was able to clone directly from gitea01 successfully | 12:09 |
priteau | yeah, lower-constraints and py36 jobs ran on ubuntu-bionic, while pep8 and py39 ran on ubuntu-focal | 12:09 |
fungi | priteau: is the problem you're seeing consistent or intermittent for those jobs? | 12:09 |
frickler | gmann: how about bumping to py36 and focal? https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L877-L892 | 12:09 |
frickler | ehm, py38 ofc | 12:10 |
priteau | All jobs failed consistently this morning. Not seen this issue ever before | 12:11 |
priteau | I'll check the tox_install.sh script. I don't remember much about it, from the comment at the top it works around issues seen when nova wasn't on PyPI, which isn't the case anymore. | 12:12 |
fungi | frickler: that could be one reason bionic is still used. yoga's pti says openstack will continue to test that branch against python 3.6 in order to ensure compatibility with centos stream 8 (but really it's a stand-in for rhel 8 since there's no newer rhel yet): https://governance.openstack.org/tc/reference/runtimes/yoga.html | 12:13 |
fungi | i've finished tests cloning from 01-03 so far. time to complete ranges between 10-15 minutes here at my house, which is reasonably well-connected | 12:20 |
fungi | no errors from git yet | 12:20 |
priteau | frickler: According to tox_install we should be using zuul-cloner to use the local git cache, but looking at logs I am guessing this program is gone from the CI environment, so we fall back to cloning on the network like we would from a dev's machine | 12:23 |
*** ysandeep|afk is now known as ysandeep | 12:23 | |
priteau | I can run these jobs fine on my machine though, no cloning issue here (but I have the latest git) | 12:24 |
fungi | yes, zuul-cloner was a tool used in the old zuul v2 pull model | 12:24 |
fungi | zuul v3+ pushes git refs to the nodes instead, so they'll already be there on disk | 12:25 |
priteau | I will try adding a check for the existence of a local repo and install from there | 12:26 |
frickler | pip uses the "--filter=blob:none" which works fine on my local impish and makes the clone finish in less than a minute, but exposes the failure on focal | 12:29 |
frickler | + option for git clone | 12:30 |
fungi | oh, could this be impact from the pip 22.0.4 release i wonder? | 12:31 |
fungi | that was something like 10 days ago, as was the virtualenv release which included it, but maybe something in the last day increased the virtualenv we were using to 20.13.3? | 12:32 |
frickler | no, I suspect a bug between old(ish) git and new gitea and that option | 12:33 |
fungi | yeah, that would make more sense given the timing | 12:33 |
frickler | can we revert one gitea server easily? | 12:33 |
frickler | or shall we create a revert of the update patch and hold a node with that? | 12:34 |
fungi | the latter would be pretty easy. i'm not sure how downgradeable gitea is in production | 12:35 |
fungi | i have a feeling a production downgrade would involve wiping and repopulating everything on the server, so may just be cleaner to replace the servers if it comes down to that | 12:36 |
priteau | I wonder if this issue might impact other jobs. Have you seen any rise in failed builds today? | 12:36 |
frickler | priteau: no, I hope not many job try to clone repos that way, but it will affect downstream installers | 12:41 |
fungi | though i also hope not that many programs are trying to pip install the clone url and are instead cloning from a local checkout for efficiency (or relying on packages) | 12:47 |
fungi | reading a bit more about the architecture, i wonder if this really boils down to we've upgraded the cgit version underneath gitea as part of yesterday's update. https://github.com/go-gitea/gitea/issues/11958 has some pointers on setting git options for similar situations | 12:53 |
fungi | maybe newer git on the server side has stopped doing some things that older git clients are expecting with regard to what objects it thinks are or aren't advertised | 12:54 |
*** anbanerj is now known as frenzyfriday | 12:58 | |
frickler | I'm first trying to repro now locally using the upstream gitea containers first | 13:04 |
frickler | hmm, I get lots of errors when I try to push the nova repo to gitea. but what I do see is that with gitea 1.15 I get "warning: filtering not recognized by server, ignoring" which can also be seen in the successful blazar jobs until yesterday | 13:16 |
frickler | with 1.16.4 I no longer get that message. so the very likely is a regression in gitea, trying to support filtering but failing | 13:16 |
frickler | seems that https://github.com/go-gitea/gitea/pull/18195/files is the change in question | 13:19 |
*** arxcruz is now known as arxcruz|ruck | 13:25 | |
*** arxcruz|ruck is now known as arxcruz | 13:27 | |
frickler | ah, maybe the errors were because I cloned with the noblob option. trying another full clone from upstream now | 13:33 |
fungi | looks like there have been some adjustments as well in https://github.com/go-gitea/gitea/pull/18373 | 13:36 |
frickler | which version of gitea were we running before yesterday? older than 1.15 iirc? | 13:38 |
fungi | v1.15.11 | 13:39 |
fungi | according to https://review.opendev.org/828184 | 13:40 |
frickler | hmm, cannot reproduce with my local clone copy, so I'm rsyncing the repo from gitea01 now to test | 13:49 |
frickler | o.k., I should try with something smaller, seems all repos are affected, at least some other I tried. smallest for now is https://opendev.org/openstack/python-designateclient | 13:56 |
*** pojadhav is now known as pojadhav|out | 14:19 | |
frickler | o.k., the workaround documented in the above PR, setting git.DISABLE_PARTIAL_CLONE = true in gitea's app.ini works | 14:20 |
frickler | clarkb: fungi: I'm in a meeting now, maybe you can check how to do that for our deployment. or verify on one of our servers first | 14:21 |
fungi | thanks, great find! i'm in meetings for the next couple of hours as well but should have some bandwidth to multi-task so i'll start looking into it | 14:22 |
*** tosky_ is now known as tosky | 14:27 | |
*** ysandeep is now known as ysandeep|afk | 14:29 | |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Disable partial clone feature in Gitea https://review.opendev.org/c/opendev/system-config/+/834174 | 14:31 |
fungi | frickler: priteau: ^ | 14:31 |
priteau | Thanks. In the meantime I fixed it by cloning from the local git cache | 14:32 |
fungi | priteau: to be fair, you should definitely avoid cloning from a network git remote in zuul jobs | 14:33 |
priteau | Indeed, it was a good way to discover that we were relying on the retired zuul-cloner | 14:33 |
fungi | if you add openstack/nova to the required projects list for the job definition, zuul will provide an up-to-date checkout which will even work with cross-repo deps (depends-on) | 14:34 |
fungi | you can look at some of the neutron driver or horizon plugin projects for examples | 14:34 |
fungi | they similarly rely on installing neutron or horizon from a git checkout | 14:34 |
priteau | I think we should be using this for stable branches, as currently we must be testing blazar-nova stable with nova master | 14:49 |
fungi | yep | 14:49 |
fungi | if you relied on setting it as a required project, zuul would check out the corresponding stable branch of nova for you automatically | 14:49 |
gmann | frickler: ah right, we have to do that as per zed testing bits. I will push patch. thanks | 14:50 |
clarkb | frickler: fungi: is there a tldr? I don't think downgrading is straightforward. Would require rebuild of entire cluster using db from backups to preserve redirects | 15:15 |
clarkb | is it just zuul-cloner that broke? if so I'm inclined to say sorry that was deprecated and should've been removed from your jobs years ago | 15:16 |
fungi | clarkb: tl;dr is in 834174 | 15:16 |
clarkb | aha thanks looking | 15:16 |
clarkb | oh that is an interesting thing. Should we report it upstream to pip as well? | 15:17 |
fungi | probably, i also had meetings so haven't gotten farther with it yet | 15:17 |
fungi | not quite sure at this point what the suggestion for pip is without nailing down which git versions are involved | 15:18 |
clarkb | I'm +2 on disabling partial clones. That seems like a safe enough change if that was already the behavior in 1.15.11 | 15:18 |
clarkb | 4b3bfd7e89cd1527d500ac44c2564d398a6b681e is the gitea commit that added it and does seem to be 1.16 specific. Also it notes that partial clones are disabled in git by default. Why would they enable them by default. Seems like they should've inverted this toggle and let people opt into it... | 15:20 |
clarkb | anyway ++ to deploying that | 15:20 |
clarkb | fungi: frickler: do we think we need a hold for 834174 to verify before approval or just send it? | 15:21 |
fungi | clarkb: we can hold a dnm child of that change for further testing, sure, though frickler already confirmed it fixed the problem on a local install | 15:23 |
clarkb | oh I see the client sends the filters it wants. So if I just do a git cloen I should get a full proper clone even under 1.16. But if the client is trying to be smart (like pip) then maybe things go wrong | 15:23 |
clarkb | fungi: oh in that case I think we can land it | 15:23 |
fungi | i mean, we could also take gitea01 out of the pool and update its config by hand to test if that's desirable | 15:24 |
clarkb | I'm not too worried about it considering frickler already tested locally | 15:25 |
clarkb | as a followup we might consider updating our test job to do whatever it was that tripped over this | 15:25 |
clarkb | we have content in the system-config repo populated in those test jobs | 15:25 |
clarkb | I guess just git clone --filter=blob:none localhost/opendev/system-config? I can work on a patch for that | 15:26 |
fungi | `pip install git+https://opendev.org/openstack/designate` for example, or we could do `git clone ...` yeah | 15:26 |
*** ysandeep|afk is now known as ysandeep | 15:29 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Test gitea 1.16 partial clones https://review.opendev.org/c/opendev/system-config/+/834187 | 15:32 |
clarkb | that isn't based on 834174 so we should see it fail. Then we can rebase it and see it pass | 15:32 |
fungi | nice, thanks | 15:33 |
*** ykarel is now known as ykarel|away | 15:34 | |
*** ysandeep is now known as ysandeep|out | 15:37 | |
frickler | unrelated news (probably): github seem to be having some issues, too https://www.githubstatus.com/ | 15:38 |
fungi | thanks for the heads up! | 15:39 |
clarkb | maybe they upgraded to gitea 1.16 too :P | 15:39 |
fungi | #status log Pruned backups on backup01.ord.rax in order to free up some storage, info in /opt/backups/prune-2022-03-17-02-13-07.log | 15:40 |
opendevstatus | fungi: finished logging | 15:40 |
clarkb | fungi: I wonder if that is the first time we've pruned that host. We may actually want to look overthat log and check the backups that remain to ensure we didn't overprune. There were issues in the past with certain backup name prefix overlaps causing that to happen for some hosts that we thought we had sorted out and double checking is probably a good idea | 15:41 |
clarkb | in particular if a host had multiple backup targets (typicalyl due to fs backup and db backup) the pruning of one or the other could delete all backups for the other one | 15:41 |
clarkb | we had to switch them to unique prefixes to avoid that iirc | 15:41 |
clarkb | fungi: probably sufficient to do a backup listing and ensure that services with database backups continue to have the correct date spread for fs and db backups | 15:42 |
fungi | i do see the log say it's keeping both filesystem and database backups | 15:45 |
fungi | like "Keeping archive: paste01-mariadb-2022-03-16T17:28:13 Wed, 2022-03-16 17:28:15 [31cdd5957c5829be0ac464c83424b1a055e3f0d30c67c450addc83e5b4aa799c]" | 15:45 |
clarkb | and similar for paste01-fs-* or whatever it is called? Do we see that for review02 and gitea01 and so on? | 15:46 |
fungi | right, also paste01-filesystem-... | 15:46 |
clarkb | cool. As long as we see that for more than just paste we should be good | 15:47 |
clarkb | in particular I worry about review db backups causing filesystem backups to be removed since gerrit db backups aren't worth much these days | 15:47 |
fungi | it seems to treat "/opt/backups/borg-paste01/backup archive paste01-mariadb" separately from "/opt/backups/borg-paste01/backup archive paste01-filesystem" | 15:47 |
clarkb | yes they are distinct backup archives | 15:48 |
fungi | review01-filesystem and review01-mysql both kept, yes | 15:48 |
clarkb | the problem before was the prune command takes a target archive prefix (due to the common practice of appending dates to the backups) and then prunes everything it can match. So if you just pruned paste01 it would get confused | 15:49 |
clarkb | fungi: review01 doesn't exist anymore :) what about review02? | 15:49 |
fungi | oh, right ;) | 15:49 |
clarkb | but ya I think we addressed it all when we noticed the problem and now its just verification we didn't miss anything. Sounds like it is probably working as expected | 15:49 |
fungi | Keeping archive: review02-filesystem-2022-03-16T17:46:02 Wed, 2022-03-16 17:46:04 [7c55371b17f71d6c5ff23171d1bfb08b327eaa320f2c0718fe23fbb2a540f136] | 15:50 |
fungi | Keeping archive: review02-mariadb-2022-03-16T17:50:32 Wed, 2022-03-16 17:50:33 [20a894f13f396ecb69f98dbc9d07fbb9c8bf460ebd4f6428d4b52fe7ff2edeea] | 15:50 |
clarkb | perfect! | 15:51 |
fungi | system-config-run-gitea failed on 834174, looking into it | 15:52 |
clarkb | fungi: looks like failure to encrypt the logs | 15:53 |
fungi | encrypt-file: Validate input file | 15:53 |
clarkb | ya one of those files (of course ansible doesn't tell you which) does not exist | 15:53 |
clarkb | https://zuul.opendev.org/t/openstack/build/66afc5f33d674bab8c29c5aea9240faa/console#3/1/15/bridge.openstack.org I'm not sure how to read that. It shows 7 inputs but one output | 15:55 |
clarkb | does that imply the first input is the one that failed? I think that file not existing would make sense since it is just a test playbook not a normal prod run playbook with the logging setup | 15:55 |
clarkb | https://zuul.opendev.org/t/openstack/build/66afc5f33d674bab8c29c5aea9240faa/console#3/1/12/bridge.openstack.org hrm but that shows the files existing | 15:56 |
*** dviroel|ruck is now known as dviroel|ruck|lunch | 15:56 | |
fungi | probably the first time we've run this job since 829853 merged | 15:56 |
fungi | it added the "file does not exist" error for when: not _stat_result.stat.exists | 15:57 |
fungi | previously we only checked for encrypt_file is undefined | 15:57 |
clarkb | ah. In that case maybe the issue is it is checking that entire list as if it were a single file | 15:57 |
clarkb | which is sort of how I'm reading that 7 inputs one output now | 15:58 |
clarkb | its a single input with 7 list entries | 15:58 |
fungi | same | 15:58 |
fungi | didn't explode it for some reason | 15:58 |
fungi | seems to have interpreted it as a string | 15:58 |
clarkb | ya | 15:59 |
clarkb | fungi: I think we should do a partial revert of that change to revert the stat stuff? | 16:00 |
clarkb | the commit says this was done to avoid a weird error from gpg later. I think we can live with those weird errors for now | 16:00 |
fungi | if it's not obvious how to pass a list of files to a path parameter in ansible, then yes | 16:02 |
clarkb | well I think we have to rewrite the validation to do a loop and check each one separately if it is a list. If it isn't a list then check the single input var | 16:03 |
clarkb | I'm happy for someone to work on that too if they like | 16:03 |
clarkb | oh actually the rest of the code int here already does that so we can copy that probably | 16:04 |
fungi | yeah, i'm fine undoing the change to roles/encrypt-file/tasks/main.yaml from 834174, i'll push that up along with an unrevert and try to make it exercise the failing job | 16:04 |
fungi | oh, or that | 16:05 |
clarkb | either way :) | 16:05 |
clarkb | maybe just push the partial revert for now | 16:05 |
clarkb | and we can work on fixing it properly as followup | 16:05 |
fungi | on it | 16:05 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix encrypt files stat validation https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 16:12 |
clarkb | fungi: ^ something like that for fixing it but I'm not super confident in that so I think we land the partial revert and then rebase this on that and work through it more carefully | 16:12 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: encrypt-file: roll back extended file stat https://review.opendev.org/c/zuul/zuul-jobs/+/834196 | 16:17 |
clarkb | +2 thanks | 16:18 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Disable partial clone feature in Gitea https://review.opendev.org/c/opendev/system-config/+/834174 | 16:18 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Revert "encrypt-file: roll back extended file stat" https://review.opendev.org/c/zuul/zuul-jobs/+/834197 | 16:22 |
fungi | clarkb: ianw: wip for the moment ^ | 16:22 |
fungi | oh, you already pushed 834194 | 16:23 |
clarkb | fungi: ya do you want to combine it with 834197? | 16:23 |
fungi | i'll abandon 834197 | 16:23 |
clarkb | sounds good I can rebase on 834196 | 16:23 |
clarkb | then we can sort out how to test it better maybe | 16:23 |
fungi | perfect | 16:24 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix encrypt files stat validation https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 16:24 |
clarkb | thats the rebase. Happy for others to update it if they see a good way to test it. I need some breakfast but can try and take a look after that | 16:25 |
clarkb | another thing we may want to od is make that role non fatal | 16:25 |
clarkb | its a really nice to have but the logs are always available otherwise | 16:25 |
fungi | it's been a busy morning and i'm overdue for a shower, so going to disappear for a few while we wait for test results | 16:27 |
opendevreview | Marios Andreou proposed openstack/project-config master: Update channel ops for oooq (tripleo ci) channel https://review.opendev.org/c/openstack/project-config/+/834199 | 16:41 |
*** marios is now known as marios|out | 16:47 | |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix encrypt files stat validation https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 16:56 |
clarkb | the testing we have was sufficient to catch ^ | 16:56 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix encrypt files stat validation https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 16:59 |
clarkb | and I think that should cover this case | 16:59 |
opendevreview | Clark Boylan proposed opendev/system-config master: Test gitea 1.16 partial clones https://review.opendev.org/c/opendev/system-config/+/834187 | 17:02 |
*** dviroel|ruck|lunch is now known as dviroel|ruck | 17:05 | |
fungi | awesome | 17:06 |
lajoskatona | diablo_rojo_phone: Hi, regarding the Yoga marketing screenshots for Local IP and Off-path SmartNIC | 17:12 |
lajoskatona | diablo_rojo_phone: today I found only the official Neutron doc: | 17:13 |
lajoskatona | diablo_rojo_phone: https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port | 17:13 |
lajoskatona | diablo_rojo_phone: https://docs.openstack.org/neutron/latest/contributor/internals/local_ips.html#usage | 17:14 |
*** jpena is now known as jpena|off | 17:27 | |
diablo_rojo_phone | lajoskatona: yeah I think I had those links already. | 17:34 |
diablo_rojo_phone | What question did you have? | 17:34 |
lajoskatona | diablo_rojo_phone: It's not a question, just a "feedback" that I have only those from the developers of the 2 features :-) | 17:35 |
diablo_rojo_phone | Ahh got it :) Noted! | 17:36 |
lajoskatona | diablo_rojo_phone: +1 | 17:36 |
lajoskatona | diablo_rojo_phone: is there anything I can help for the marketing material? | 17:36 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix encrypt files stat validation https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 17:56 |
clarkb | we'll get there eventually :) | 17:56 |
fungi | 834174 is passing now when stacked on top of the zuul-jobs partial revert | 18:03 |
fungi | which was approved moments ago | 18:03 |
fungi | so hopefully we're close to being able to land the gitea fix | 18:04 |
clarkb | woot | 18:04 |
clarkb | https://zuul.opendev.org/t/openstack/build/e514209277c546288d95695c53777a41/log/job-output.txt#38524 my test shows the error. Now I'll rebase that on 834174 to see that it goes away, then we can land the test too to prevent future regressions | 18:06 |
opendevreview | Merged zuul/zuul-jobs master: encrypt-file: roll back extended file stat https://review.opendev.org/c/zuul/zuul-jobs/+/834196 | 18:06 |
opendevreview | Clark Boylan proposed opendev/system-config master: Test gitea 1.16 partial clones https://review.opendev.org/c/opendev/system-config/+/834187 | 18:08 |
clarkb | there is the rebase to hopefully show things working and happy | 18:08 |
fungi | excellent | 18:09 |
clarkb | fungi: maybe we can +A 834174 carring over frickler's previous +2? | 18:09 |
clarkb | or if you want to wait for 834187 to pass that seems fine too. I trust fricklers local testing though | 18:09 |
fungi | yeah, i'm good with approving it | 18:10 |
clarkb | I'm going to see about filing a bug against gitea and/or pip if I can make sense of what this clone flag does | 18:10 |
fungi | all i did was add a depends-on after his +2 | 18:10 |
clarkb | msotly nto sure if pip's flag is bad or if gitea handles it wrong | 18:10 |
clarkb | fungi: ++ | 18:10 |
clarkb | fungi: do you want to +A or should I? | 18:11 |
fungi | clarkb: one of the complexities of this situation is that newer git clients don't break | 18:11 |
clarkb | oh interesting | 18:11 |
fungi | so apparently that clone option is valid on versions of git which can't use it with gitea | 18:11 |
clarkb | fungi: the bionic version breaks. Do we know if focal's works? | 18:11 |
fungi | focal's breaks too | 18:11 |
clarkb | oh so it is pretty new git that would be needed | 18:12 |
fungi | but frickler said the version on his impish system was working | 18:12 |
fungi | so yes, relatively bleeding-edge | 18:12 |
clarkb | in that case I'm inclined to think it is likely a gitea bug | 18:12 |
clarkb | where their server side implementation isn't implemeting something compatible with older git | 18:13 |
fungi | like they only tested this implementation with very new git and missed some backward-compatible bits | 18:13 |
clarkb | they may need to ignore the filter specification based on the version of the client | 18:13 |
clarkb | ya | 18:13 |
fungi | anyway, having a consistent reproducer with git kind of rules out pip as being at fault | 18:13 |
clarkb | ya my tumbleweed git can clone with that filter against opendev.org | 18:14 |
clarkb | so I've confirmed that behavior that frickler saw | 18:14 |
clarkb | impish is 2.32.0 I'm on 2.35.1 | 18:15 |
clarkb | focal is 2.25.1 | 18:15 |
fungi | i can approve 834174 once 834196 merges if nobody beats me to it. there's no real speedup for approving it sooner than that anyway, so i'm happy to give a few more minutes for (re-)reviews | 18:15 |
clarkb | I'm going to double check focal locally and then work on a bug upstream with gitea | 18:15 |
clarkb | fungi: 834196 has merged | 18:16 |
clarkb | hirsute with 2.30.2 works | 18:18 |
fungi | oh, right i missed that in scrollback | 18:19 |
clarkb | https://github.com/go-gitea/gitea/issues/19118 filed against gitea | 18:34 |
clarkb | https://review.opendev.org/c/zuul/zuul-jobs/+/834194 passes testing now. Probably a good one to have ianw review before approving just to make sure there aren't other instances of this sort of problem in the new role | 18:46 |
clarkb | ricolin: fungi: I see there is another change to remove the -tw list https://review.opendev.org/c/opendev/system-config/+/584035 I guess whcih one is preferable? taking ownership of it or removing it? | 18:53 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/792020 cc infra-root anyone remember what that was trying to fix? I am not aware of any existing issues related to python with gerrit and I think we can abanond that | 19:01 |
fungi | clarkb: yeah, i don't feel like i have enough context on what 834194 was trying to fix (error from passing a path for a nonexistent file to gpg?) and would definitely appreciate ianw's input there | 19:07 |
fungi | it seems like the original implementation was just an attempt to catch an error condition sooner in the job and return a more helpful error | 19:08 |
opendevreview | Merged opendev/system-config master: Disable partial clone feature in Gitea https://review.opendev.org/c/opendev/system-config/+/834174 | 19:16 |
fungi | clarkb: frickler: the gitea config update has deployed, are you still able to recreate the failures? | 19:59 |
fungi | it doesn't look like updating the config restarts the container | 20:00 |
fungi | but maybe the config is reread dynamically? | 20:01 |
clarkb | fungi: just sitting back down after lunch I'll check | 20:07 |
clarkb | fungi: no still seems to fail. I think we need to do rolling restarts of gitea. Those are a bit tricky because ordering matters to prevent gerrit replication issues. | 20:09 |
clarkb | fungi: I'm happy to do them or walk someone else through them if interested. Basically you disable the node in haproxy, docker-compose down, then docker compose up -d only the mariadb and gitea-web processes. Then once web is responding you up -d gitea-ssh too. Then reenable in haproxy | 20:10 |
clarkb | this ensures that gerrit can't push to gitea when gitea web is unable to trigger the appropriate hooks to record the updates | 20:10 |
clarkb | oh this is interesting though https://docs.gitea.io/en-us/config-cheat-sheet/#server-server indicates a sighup might also work | 20:11 |
clarkb | I'll go ahead and start the typical process with gitea01 momentarily .Then we can double check behavior against it at least | 20:12 |
clarkb | 'warning: filtering not recognized by server, ignoring' but my clone worked | 20:15 |
clarkb | that was against 01 after doing a restart of the service. I'll go through the other 7 in sequence | 20:15 |
clarkb | and done | 20:27 |
clarkb | I did my clone test against all 8 backends and they all emit that warning but succeed otherwise | 20:27 |
clarkb | I think this is likely happy now. priteau you can rerun those jobs I guess | 20:28 |
priteau | Thank you clarkb, I will give them a try | 20:28 |
priteau | Sorry for generating all that work, I didn't realise my message this morning would send you down this rabbit hole | 20:28 |
fungi | awesome thanks! | 20:29 |
fungi | priteau: no need to apologize, we appreciate the early notice of a regression | 20:29 |
clarkb | it also produced a bug report upstream so hopefully this helps them improve their software too | 20:31 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: Move grub-install to the end, and skip for partition images https://review.opendev.org/c/openstack/diskimage-builder/+/826976 | 20:32 |
ianw | argghh sorry about the encrypt-file breakage | 20:39 |
clarkb | ianw: no worries. I think I got a fix with test coverage pushed but we wanted to make sure it made sense to you before landing it. In the meantime we just landed a partial revert of the change that broke things | 20:44 |
fungi | yeah, it was not hard to find or undo, don't sweat it | 20:45 |
clarkb | side ntoe on the gitea thing. It occured to me that the bug may be in the server side git install depending on how much gitea passes through to git. The alpine 3.13 git version matches debian bullseye's git which means we install the same git version as the upstream docker images. I am able to reproduce the issue against try.gitea.io as well. | 20:50 |
ianw | thanks, that fix looks good. it was just to give a nice error early rather than pass missing files to gpg | 20:51 |
priteau | clarkb, fungi: Build succeeded! | 20:52 |
clarkb | woot | 20:52 |
fungi | thanks for confirming, priteau! | 20:53 |
ianw | clarkb/fungi: if you have a sec, should we confirm a plan for opendev/gerrit retirement? i'm happy to execute | 20:55 |
clarkb | ianw: catching up on that was on my list of todos but then I got distracted | 20:55 |
*** dviroel|ruck is now known as dviroel|afk | 20:55 | |
clarkb | ianw: you're suggesting that instead of deleting all the repo content on every branch as we typically do we retire it normally otherwise and then update robots.txt to prevent it from being indexed? | 20:56 |
clarkb | The downside to that is we'd have to update robots on gerrit and gitea ya? | 20:56 |
clarkb | oh wait you're retiring the master branch on the gerrit repo as normal | 20:57 |
ianw | right, i guess i can do that for the other | 20:57 |
ianw | $ git branch -a | grep gerrit/openstack/ | wc -l | 20:57 |
ianw | 20 | 20:57 |
ianw | -ish branches ... | 20:57 |
fungi | normally we only delete and replace content from the master branch, i think | 20:57 |
ianw | yeah | 20:58 |
clarkb | in this case the master branch was never the interesting one | 20:58 |
fungi | as i mentioned yesterday, we could consider simply deleting the other branches as well | 20:58 |
ianw | but that still leaves the old changes indexable | 20:58 |
clarkb | but it is the one people are most likely to discover so I'm fine with that | 20:58 |
fungi | any of the changes we merged onto those branches will not be garbage-collected anyway because of the named refs for them | 20:58 |
ianw | so you won't be able to browse "what was openstack/opendev running when they were using gerrit 2.X" in the big picture, but indivdual changes are still there? | 21:00 |
ianw | if "what was openstack running" is actually useful is questionable | 21:00 |
clarkb | I think I would prefer to retire them instead of deleting them? | 21:00 |
clarkb | the utility of that is probably minimal. If someone is running 2.x still (and they show up on the mailing list occasionally) they may be interested in our patches | 21:01 |
clarkb | for our own needs I'm not sure it is helpful anymore | 21:01 |
ianw | my concern is that google still looks at the change history, and indexes everything | 21:01 |
clarkb | well I don't think we should delete chagnes in gerrit | 21:02 |
clarkb | (we have that ability but it is something we should avoid) | 21:02 |
fungi | to see "what was opendev running" it's still fairly trivial to filter changes in the gerrit webui even if the branches to which they merged are no longer present | 21:02 |
ianw | i tend to agree, hence the robots proposal to kick it out of search engines :) | 21:03 |
clarkb | ianw: that won't kick out changes though, just the branches | 21:03 |
ianw | hrm, as in a search of merged changes on branch openstack/2.13? | 21:04 |
clarkb | since gitea doesn't serve the change refs via its web ui | 21:04 |
clarkb | oh I thought you meant specific change refs like refs/changes/56/123456 | 21:04 |
clarkb | it would block access to openstack/* branches via the branch refs | 21:04 |
clarkb | left a thought on https://review.opendev.org/c/openstack/project-config/+/833939 about maybe delaying the switch of the acls while we clean up branches. But we can do the other chagnes in the meantime? | 21:04 |
ianw | sorry i'm just talking about search results, as in blocking opendev/gerrit/* stops indexing of everything | 21:05 |
clarkb | but only in gitea as proposed | 21:06 |
clarkb | the gerrit chagnes would still be indexed on review.opendev.org | 21:06 |
clarkb | whcih is what I'm trying to clarify around | 21:06 |
ianw | yes, i don't think those changes matter, that's our changes | 21:06 |
ianw | the confusing thing is when you search for gerrit things, and then end up on opendev.org looking at our tree, but it's an "upstream" change | 21:07 |
ianw | if that makes sense ... it's happened to me quite a few times, when searching for gerrit specific stuff | 21:07 |
clarkb | ya I think so. You want to know the current state but get our stale fork/mirror so the content isn't accurate | 21:07 |
clarkb | if we retire the branches and replicate that state the indexers should catch up that way too right? and then we wouldn't need to have a special robots rule? That might be preferable? | 21:08 |
clarkb | (I keep a local git clone fwiw and search through that as the google hosted repo isn't easy to navigate either) | 21:08 |
ianw | i'm thinking that if the indexer hits https://opendev.org/opendev/gerrit/commits/branch/master though, although the latest commit will wipe everything, if it walks back in the history it might still be indexing all the old upstream commits still | 21:09 |
opendevreview | Merged zuul/zuul-jobs master: Fix encrypt files stat validation https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 21:10 |
clarkb | it does, btu I think the google indexer is smart enough to deprioritize the old code | 21:10 |
clarkb | fungi: left a response to your comment on https://review.opendev.org/c/zuul/zuul-jobs/+/834194 | 21:10 |
clarkb | fungi: we do test the single file case earlier in the test cse | 21:11 |
ianw | fungi's point that https://review.opendev.org/q/project:opendev/gerrit+branch:openstack/2.13 is also a view of what we were doing is good too | 21:11 |
corvus | i'd like to begin a rolling zuul restart; any thoughts? | 21:13 |
corvus | (load looks quite low actually) | 21:13 |
clarkb | corvus: the changes to fix our gitea deployment are all complete so no concerns there | 21:14 |
clarkb | I need to go do a school run nowish though so can't help much | 21:14 |
clarkb | no objections from e | 21:14 |
fungi | sounds great corvus, thanks! | 21:16 |
ianw | clarkb: perhaps then delete branches !master, and i could update the readme to show gerrit searches that show the changes in each branch? | 21:16 |
fungi | clarkb: oh, yep i missed that other test case, awesome | 21:16 |
ianw | and assume that search engines will not walk backwards and index the changes too much | 21:17 |
fungi | ianw: clarkb: the branch deletion approach is compelling insofar as it doesn't require bespoke .htaccess rules or the like which are only controlled by the sysadmin team... this is a solution any project can apply | 21:18 |
fungi | or robots.txt entries | 21:19 |
ianw | i feel like most other projects aren't that concerned about indexing of the changes, though | 21:22 |
corvus | beginning rolling restart of mergers/executors | 21:23 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 22:00 |
ianw | ohhh, it's always something | 22:48 |
ianw | because the avatar is referenced like "https://173.231.255.102:3081/avatars/opendev" it gets served as text/xml, not image/svg+xml | 22:49 |
ianw | so it works when you look at it, but doesn't work when embedded in a page, because ... something or other about xml namespaces blah blah | 22:49 |
ianw | hrm, even adding .svg to it doesn't seem to help | 22:51 |
fungi | xml: this is why we can't have nice things | 22:51 |
fungi | we might be able to force it in the apache proxy layer if the urls are consistent | 22:52 |
fungi | but that seems like an awful hack | 22:52 |
ianw | if err := png.Encode(w, *m); err != nil { | 23:01 |
ianw | looks like it really has to be a png | 23:01 |
ianw | that's the upload backend, that seems to convert whatever comes in to a png | 23:01 |
fungi | oh, huh | 23:22 |
clarkb | I guess we need PNGs? | 23:26 |
clarkb | sorry I decided to update laptop firmware after school run since I'll rely on laptop more with family visiting next week and that took much longer than I expected. fwupdmgr didn't work for whatever reason so I had to fall back to manual process | 23:27 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 23:39 |
opendevreview | Ian Wienand proposed opendev/system-config master: gitea: set custom avatars for orgs https://review.opendev.org/c/opendev/system-config/+/834085 | 23:40 |
ianw | ^ png's it ... it will look like https://173.231.255.102:3081/opendev/ | 23:41 |
ianw | (until i delete that held node :) | 23:41 |
clarkb | ianw: left some thoughts | 23:49 |
clarkb | ianw: fungi: https://review.opendev.org/c/opendev/system-config/+/834187 that should be a good update to gitea testing to help prevent regressions like hte one we just had today | 23:50 |
fungi | thanks for the reminder, i meant to look over that. lgtm! | 23:52 |
opendevreview | Clark Boylan proposed opendev/system-config master: Rebuild Gerrit images https://review.opendev.org/c/opendev/system-config/+/834244 | 23:53 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!