corvus | there are 3 changes in the zuul openstack tenant; i'm dequeing them manually to let the automated zuul restart playbook finish gracefully | 14:37 |
---|---|---|
fungi | thanks! i'm around for a few minutes, but need to leave to get lunch soon | 14:40 |
corvus | i think i'll just wait for it to finish; it's rebooting mergers now; probably takes a few minutes for each, so probably nothing happening until about 1500 anyway :) | 14:41 |
fungi | zuul does look pretty quiet, but i guess that's to be expected on a saturday | 14:48 |
corvus | it's restarting schedulers now, so getting close to being finished... then i can restart it again :) | 14:53 |
corvus | okay it's done; i'm going to ... dequeue some more changes, then stop all of zuul | 15:03 |
corvus | speaking of... https://review.opendev.org/867177 would be really helpful to have merged. :) | 15:04 |
corvus | as a consequence the opendev-prod-hourly jobs aren't going to be dequeued, they're just going to be killed | 15:04 |
Clark[m] | That is fine they'll run again in an hour | 15:05 |
corvus | yeah, but if it worked i could gracefully dequeue and re-enqueue them :) | 15:05 |
Clark[m] | Is there a reason to dequeue first? Aren't we stopping zuul then clearing all the state anyway? | 15:06 |
corvus | so that i can re-enqueue them with a click | 15:06 |
Clark[m] | Ah | 15:06 |
corvus | everything is down now | 15:07 |
corvus | running docker-compose run scheduler bash | 15:09 |
corvus | and zuul-admin delete-state | 15:09 |
corvus | on zuul01 | 15:09 |
corvus | it's not fast | 15:10 |
corvus | the zk data size is progressing down on grafana | 15:11 |
corvus | it's done | 15:11 |
corvus | started zuul01 scheduler and all mergers | 15:12 |
corvus | starting all executors | 15:12 |
corvus | starting zuul02 scheduler | 15:13 |
Clark[m] | Was zuul02 started after zuul01 steady state? | 15:13 |
corvus | starting both web servers | 15:13 |
corvus | no | 15:13 |
Clark[m] | (just curious on order of operations) | 15:13 |
corvus | starting everything at once should be fine even. | 15:14 |
corvus | they will handle different tenants | 15:14 |
corvus | it looks like all tenants except openstack are online now | 15:15 |
corvus | we're on x/f | 15:17 |
Clark[m] | I think last time we did this openstack took almost 20 minutes | 15:17 |
corvus | i think since then we may have added the threadpool executor processing of cat jobs | 15:18 |
corvus | it's done with the cat jobs now | 15:18 |
corvus | 2023-09-09 15:18:29,657 DEBUG zuul.GithubRequest: GET https://api.github.com/repos/eventlet/eventlet result: 403, size: 280, duration: 41 | 15:19 |
corvus | 2023-09-09 15:18:29,657 WARNING zuul.GithubRateLimitHandler: API rate limit reached, need to wait for 3259 seconds | 15:19 |
corvus | so... we're offline for an hour? | 15:20 |
corvus | https://paste.opendev.org/show/b9VVT76ngRpZDknxhnpe/ is the full list of projects without installation ids | 15:24 |
Clark[m] | The issue there being we get fewer API requests per $time when scanning those repos. I guess we should consider some cleanup if it causes us to be unable to reload configs on a clean startup | 15:25 |
corvus | yeah, if nobody is actually using those, then probably time to remove them | 15:26 |
Clark[m] | I suspect that Ansible and friends require the most API requests to evaluate. But also projects like cherrypy may not be necessary any longer | 15:26 |
corvus | pretty sure the app was installed in some of those repos previously | 15:26 |
corvus | anyway, i see two ways forward: either we manually remove them from the config now, or eat breakfast and come back at 16:15. | 15:27 |
corvus | (though there's no guarantee that we won't run into another rate limit then) | 15:28 |
Clark[m] | I think some of those are used and never had apps installed. As they are used on our end for integration stuff not reporting back with any coordination to the other side | 15:28 |
corvus | yep | 15:29 |
Clark[m] | We should already exclude things like loading configs from those repos. Does that help reduce the number of API queries we need to make? | 15:30 |
Clark[m] | In any case I suspect that the Ansible related repos are the majority of the problem here due to how they use branches. I think Ansible did have an app once but no longer. Removing it would likely break people including our system-config Ansible devel test job | 15:32 |
corvus | i'm not sure all the api requests that are being done, but the one that tripped the limit was just fetching general info for the repo. | 15:33 |
Clark[m] | (for those that don't know Ansible repos seem to use the central repo as location for branches used to make PRs rather than separate forks. Similar in workflow to Gerrit if you hand wave a bit but the problem is you get real branches and lots of them) | 15:35 |
corvus | 2023-09-09 15:18:27,226 DEBUG zuul.GithubRequest: GET https://api.github.com/repositories/7833168/branches?per_page=100&page=7&per_page=100 result: 200, size: 14097, duration: 264 | 15:36 |
corvus | Clark: i think your theory is sound -- but maybe also add kibana to the list; that appears to be what that one is | 15:37 |
Clark[m] | I'm guessing 100 per page is the most we can request. I wonder if we can have git list them directly instead and bypass the rate limit | 15:38 |
Clark[m] | But that doesn't change all the work needed to be done against the branches | 15:39 |
corvus | git doesn't tell you if branch protection is enabled | 15:40 |
Clark[m] | Aha | 15:40 |
fungi | do we need to know about branch protection for repos that we're not loading configuration from and not gating? | 16:01 |
Clark[m] | You could still have centrally managed zuul config operating against the repos that need the protection info? | 16:05 |
fungi | ah, i suppose so | 16:08 |
fungi | anyway, disappearing for the next ~24 hours, won't have a computer with me, but will check back in tomorrow | 16:09 |
fungi | thanks for working on the restarts, sorry i had to miss most of it | 16:09 |
* fungi vanishes in a puff of electrons | 16:10 | |
corvus | 2023-09-09 16:13:25,109 WARNING zuul.GithubRateLimitHandler: API rate limit reached, need to wait for 3565 seconds | 16:13 |
corvus | this is impractical | 16:13 |
Clark[m] | If we remove those github projects from the list would we fail to load configs for all projects that rely on them? | 16:15 |
corvus | depends on the config, but if they're just referenced by individual jobs, then just those jobs would disappear. if they are in project-pipelines, then we'd lose the project pipeline config. | 16:16 |
Clark[m] | We'd lose the project pipeline config for the GitHub project? Or maybe I don't understand the nuance there. | 16:17 |
Clark[m] | If I understand generally though jobs like system-config's run job with Ansible devel and ara from source would go away but system config as a whole would be fine. There are likely other users of Ansible though | 16:21 |
Clark[m] | I think this job is the only usage of ara according to codesearch | 16:21 |
corvus | Clark: yes, we'd lose the ppc for the github project | 16:23 |
corvus | summary: the current status is that the scheduler is waiting on the github api limit to clear in order to finish loading the openstack tenant, at which point the web servers will come online, and the system will be fully up. | 16:32 |
corvus | that will take at least another 45 minutes, and if the api limit is exceeded again, another hour after that, and so on. | 16:32 |
corvus | Looks like it's up | 17:44 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire os-win: end project gating https://review.opendev.org/c/openstack/project-config/+/894407 | 18:50 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire compute-hyperv: end project gating https://review.opendev.org/c/openstack/project-config/+/894408 | 18:52 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire networking-hyperv: end project gating https://review.opendev.org/c/openstack/project-config/+/894409 | 18:54 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire oswin-tempest-plugin: end project gating https://review.opendev.org/c/openstack/project-config/+/894410 | 18:55 |
opendevreview | Radosław Piliszek proposed opendev/system-config master: Add codesearch to cacti https://review.opendev.org/c/opendev/system-config/+/894417 | 19:49 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update theacl for retired winstacker project repo https://review.opendev.org/c/openstack/project-config/+/894418 | 19:51 |
opendevreview | Ghanshyam proposed openstack/project-config master: Update the gerrit acl for retired winstacker project https://review.opendev.org/c/openstack/project-config/+/894418 | 19:53 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire os-win: end project gating https://review.opendev.org/c/openstack/project-config/+/894407 | 19:53 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire compute-hyperv: end project gating https://review.opendev.org/c/openstack/project-config/+/894408 | 19:55 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire networking-hyperv: end project gating https://review.opendev.org/c/openstack/project-config/+/894409 | 19:55 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire oswin-tempest-plugin: end project gating https://review.opendev.org/c/openstack/project-config/+/894410 | 19:55 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire os-win: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894419 | 19:59 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire compute-hyperv: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894420 | 20:02 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire networking-hyperv: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894441 | 20:05 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire oswin-tempest-plugin: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894442 | 20:10 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!