clarkb | the emails about needing 2fa for the old github account for review-dev make me wonder if we should just kill that account? | 00:16 |
---|---|---|
fungi | or just keep ignoring it, and add 2fa if we ever need to log into it | 00:31 |
frickler | this zuul config error seems to say that we don't actually check all branches anymore. but I also really would like to see us have a way to get rid of these errors before adding more repos from github: Zuul encountered an error while accessing the repo sqlalchemy/sqlalchemy. The error was: Will not fetch project branches as read-only is set | 05:41 |
frickler | corvus: ^^ do you have some context for that? | 05:41 |
*** tosky_ is now known as tosky | 12:50 | |
fungi | frickler: i don't have an answer, but the earliest version of that error was introduced more than two years ago with the initial patchset of https://review.opendev.org/816807 and code comments around the current version of the two places it can be raised (as either LookupError or RuntimeError) indicate we should expect that when the scheduler either hasn't attempted to fetch branches from a | 14:07 |
fungi | project yet or has tried and got an error from the remote when it did | 14:07 |
fungi | so my guess is that something changed permissions in the sqlalchemy/sqlalchemy repo or there was a transient github api error the last time a scheduler tried to get its branches | 14:08 |
frickler | fungi: well the error seems to show up for all repos we use from github, I just used that specific one as example. the error have also been persistenly listed for at least some months, so I don't think it is any transient behaviour | 14:38 |
frickler | as mentioned in the review adding eventlet, it might be interesting to see if the error also appears if we actually add the zuul app to it on github | 14:41 |
frickler | one could also check whether the error actually appears for all github projects or just some subset | 14:41 |
corvus | the error means that the scheduler did not put all of the expected information about the project into the zk cache and the web server, which is responsible for producing those errors, refuses to do that work because it's not its job. it's certainly not working as designed, but exactly why will need some investigation. the scheduler may still be querying all the data, and whatever the web server thinks is missing might also cause the scheduler | 14:59 |
corvus | to query too often. | 14:59 |
clarkb | frickler: re https://review.opendev.org/c/openstack/project-config/+/906071 maybe lets land that and see if we get a different result from zuul when adding a new project to a running zuul vs trying to load all the projects when restarting zuul? | 16:17 |
clarkb | also I'm hesitant to require the github app because we know the permissions are overly aggressive | 16:18 |
frickler | clarkb: sure, I'm not against testing things. I also tried looking at the code a bit but it seems I still lack some basic understanding of how this all works, like how does a runtime error end up in the config error list? and how is that list persisted and under what condition would such an error ever get removed from it again? | 16:22 |
clarkb | its a config error because it hasn't been able to load the configs from those branches | 16:24 |
clarkb | which is actually I think ok here because we don't actually load configs from those branches but zuul will still check first I guess | 16:24 |
clarkb | I think to get the error to go away we have to force zuul to attempt to refetch the project info from github. I don't know what triggers that it is possible a restart is required | 16:25 |
clarkb | if these projects were expected to provide their own job configs then this would be far more problematic but since they aren't I don't think anyone notices | 16:26 |
frickler | well we have "include: []" for all github projects, so there sure shouldn't be any attempt to load any config from those repos? | 16:40 |
fungi | i have a vague recollection there was a reason to not try to fetch configs if the tenant config says not to load any from the project. could it be that the scheduler was smart enough not to try, but then zuul-web is confused by those branches not being in the cache? | 16:41 |
clarkb | ya I suppose that could be possible but I would need to go and reread the code | 16:59 |
frickler | ok, I looked at debug.log on zuul02 and there was a tenant reconfiguration event on openstack this morning and github returned just 401 for all queries like this: 2024-01-19 06:20:54,634 DEBUG zuul.GithubRequest: GET https://api.github.com/repos/sqlalchemy/alembic/branches?per_page=100 result: 401, size: 80, duration: 65 | 17:00 |
frickler | also 2024-01-19 06:20:57,358 INFO zuul.GithubConnection.GithubClientManager: No installation ID available for project sqlalchemy/sqlalchemy | 17:05 |
fungi | https://docs.github.com/en/rest/branches/branches indicates that method should be available anonymously, so not sure why it would result in a 401 (unauthorized) response unless that's github rate-limiting kicking in | 17:06 |
frickler | yes, I was just testing that it works if I try it manually | 17:07 |
fungi | i'm able to request the same url and get reasonable data back rather than a 401 | 17:07 |
fungi | right | 17:07 |
frickler | hmm, I tried to trigger the rate limit, which is 60 reqs/h, and if I do that, the response is a 403, not 401 | 17:10 |
frickler | so it looks like zuul may actually be using some kind of auth that github however considers invalid? | 17:11 |
corvus | likely the api token since https://review.opendev.org/794688 | 17:12 |
fungi | so maybe our api token expired or was revoked? | 17:25 |
clarkb | I think you can list api tokens in github if you login somewhere | 17:25 |
clarkb | that might tell us | 17:25 |
clarkb | I'm able to keep an eye on the gitea 1.21.4 upgrade today if we think it is safe to do so https://review.opendev.org/c/opendev/system-config/+/906062 | 17:39 |
clarkb | the review.o.o cert check did pass last night so ya I guess jsut a race between systems | 17:39 |
frickler | o.k., I confirmed that when using the api_token that is in the zuul.conf on zuul02, github returns a 401. I also double checked my command with a personal token, that gives a 200. so the question is where did that token come from? I don't see any token in either the opendevadmin or openstackadmin account | 19:42 |
clarkb | I'm not sure. I suspect corvus probably set it up and may recall. However isn't the username part of the connection details for that otken? | 19:44 |
clarkb | but ya maybe github removed it for some reason and we need to make a new oen | 19:44 |
fungi | git history should at least narrow it down to a particular timeframe | 19:44 |
fungi | as far as when it was added, in which case there may be contemporary discussion in logs regarding how | 19:45 |
frickler | hmm, my firefox history found https://review.opendev.org/c/zuul/zuul/+/794688 , but that is very recent. zuul.conf is pretty exactly one year older | 19:49 |
frickler | had to go very far back and found this https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2021-06-03.log.html#t2021-06-03T23:07:59 | 19:54 |
frickler | I can try to generate a fresh one on monday or maybe tomorrow, off for now | 19:56 |
fungi | sorry, i meant git blame in the group_vars on bridge, which shows api_token was added to the zuul-scheduler group by a commit made 2021-06-03 | 19:57 |
fungi | but yes, that coincides with the irc log you found | 19:57 |
opendevreview | Merged opendev/system-config master: Update gitea to 1.21.4 https://review.opendev.org/c/opendev/system-config/+/906062 | 20:01 |
clarkb | that should start deploying in a few minutes. The zuul and eavesdrop job should got first | 20:13 |
clarkb | all of the giteas appear upgraded | 20:28 |
clarkb | and the infra-prod-service-gitea job was successful | 20:28 |
clarkb | I have successfully cloned system-config too. This is looking good tome | 20:29 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!