@jim:acmegating.com | swest: 809414 is looking pretty good to me | 00:09 |
---|---|---|
@jim:acmegating.com | mhu: thanks, that's a lot better :) i left one more question on the change | 00:21 |
@gtema:matrix.org | > <@jim:acmegating.com> @foodster:matrix.org: honestly no. for zuul to work in a gating environment, it needs to be in control of merging, otherwise its testing and operation isn't valid. if you can't alter the gitlab configuration to work with zuul, or update the zuul gitlab driver to work with your workflow, then i don't think trying to work around that in jobs is the right way to go. | 04:54 |
in my setup (without ff) I get zuul working without problems even with multiple changes in parallel. Surely I get merge conflicts, but that is same with github and even gerrit. I would be glad if we can merge squashing into Zuul fast so that also commit messages look reasonable. | ||
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-registry] Fix manifest HEAD response when looking up by label https://review.opendev.org/c/zuul/zuul-registry/+/809528 | 06:35 | |
@iwienand:matrix.org | corvus: Clark ^ that fixes the issue reported in #opendev manifesting on https://review.opendev.org/c/opendev/system-config/+/809488/. i could replicate this locally in my test environment, and ^ fixed it | 06:36 |
@iwienand:matrix.org | if it looks ok, probably easiest way to confirm is to merge it and recheck on 809488 | 06:37 |
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] wip: Simplified attribute API for ZKObjects https://review.opendev.org/c/zuul/zuul/+/809532 | 07:29 | |
@westphahl:matrix.org | corvus: ^ felixedel and I had an idea to simplify the zkobject API a bit so we don't have to write `x.updateAttributes(...)` all over the place | 07:36 |
@avass:vassast.org | I'm trying to temporarily set up executors outside of openshift and use routes with SNI to connect them to zookeeper but I can't get that to work. So is that just not supported by kazoo or am I doing something wrong? | 08:05 |
@avass:vassast.org | tobiash: any idea ^ ? | 08:05 |
@tobias.henkel:matrix.org | we have cross region zk working, need to check our deployment | 08:05 |
@avass:vassast.org | I can connect to zk with `openssl s_client` so if I'm missing something it's probably something in our executor/docker-compose config | 08:06 |
@tobias.henkel:matrix.org | avass: we're using a nodeport service for zk, not sni | 08:06 |
@avass:vassast.org | tobiash: ok got it, was trying to move away from that | 08:06 |
@avass:vassast.org | are executors still using gearman now, or is that completely replaced by zookeeper now? | 08:09 |
@tobias.henkel:matrix.org | avass: I think sni needs to be added here: https://github.com/python-zk/kazoo/blob/6337fd6f72b59fb20886f980f2e0d6d41525dc35/kazoo/handlers/utils.py#L242 | 08:13 |
@avass:vassast.org | tobiash: doesn't look like it should be too hard to create a PR to support that | 08:16 |
@tobias.henkel:matrix.org | could be as easy as adding the server_hostname argument there | 08:31 |
@avass:vassast.org | yeah | 08:39 |
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] Fix wrong if condition in job request state watch https://review.opendev.org/c/zuul/zuul/+/809629 | 10:34 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] Web UI: make more filters selectable in build, buildset searches https://review.opendev.org/c/zuul/zuul/+/793159 | 10:57 | |
-@gerrit:opendev.org- Simon Westphahl proposed: | 11:06 | |
- [zuul/zuul] Move common change cache related methods to mixin https://review.opendev.org/c/zuul/zuul/+/809632 | ||
- [zuul/zuul] Simplify Zookeeper change cache API https://review.opendev.org/c/zuul/zuul/+/809633 | ||
@westphahl:matrix.org | Clark: re your comments on 806556 I pushed 809632 + 809633 as a follow up | 11:13 |
-@gerrit:opendev.org- Felix Edel proposed: | 12:54 | |
- [zuul/zuul] Don't use executor.builds when processing build result events https://review.opendev.org/c/zuul/zuul/+/808091 | ||
- [zuul/zuul] Fix race in test_data_return_child_from_retried_paused_job https://review.opendev.org/c/zuul/zuul/+/808918 | ||
- [zuul/zuul] Don't use executor.builds to find out if tests are settled https://review.opendev.org/c/zuul/zuul/+/808792 | ||
- [zuul/zuul] Remove the local builds list from the executor client https://review.opendev.org/c/zuul/zuul/+/809175 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 13:53 | |
- [zuul/zuul] Move common change cache related methods to mixin https://review.opendev.org/c/zuul/zuul/+/809632 | ||
- [zuul/zuul] Simplify Zookeeper change cache API https://review.opendev.org/c/zuul/zuul/+/809633 | ||
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] REST API: Implement nodes filtering https://review.opendev.org/c/zuul/zuul/+/736042 | 14:16 | |
@mhuin:matrix.org | hello zuul-maint, this change is a +3 away from merging: https://review.opendev.org/c/zuul/zuul/+/806201 | 14:20 |
@jim:acmegating.com | swest, felixedel: did you see my comment on 809414? | 14:27 |
@jim:acmegating.com | i'm asking because it looks like the only improvement in 809532 was dealing with 2 updatesAttribute calls that i think can be combined into one. so activeContext ends up being more verbose... | 14:28 |
@jim:acmegating.com | i guess there's still the question of enqueue_time in that example -- that would either be 2 updateAttribute calls or a dict. so maybe in that case activeContext wins? | 14:30 |
@jim:acmegating.com | swest, felixedel: ok, it lgtm. :) | 14:32 |
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul-registry] Fix manifest HEAD response when looking up by label https://review.opendev.org/c/zuul/zuul-registry/+/809528 | 16:19 | |
@mhuin:matrix.org | hello zuul-maint, this change was +2'ed before the rebase, for your consideration: https://review.opendev.org/c/zuul/zuul/+/736042 | 16:25 |
@spamaps:spamaps.ems.host | If you were going to deploy Zuul into a kubernetes cluster where you don't have Admin.. just curious.. how might any of you do that? The operator isn't an option for me. | 16:41 |
@spamaps:spamaps.ems.host | (Specifically a GKE cluster, so.. the full power of GCP is available) | 16:42 |
@jim:acmegating.com | spamaps: i can halfway help with that: here is a deployment i set up on GKE, without using the operator (it predates it): https://gerrit.googlesource.com/zuul/ops/ | 16:43 |
@jim:acmegating.com | however, we do have full access to that | 16:43 |
@jim:acmegating.com | spamaps: the service account setup is particularly interesting | 16:44 |
@spamaps:spamaps.ems.host | I can create a project and have full admin in that project, so it shouldn't be too much of a problem. | 16:45 |
@spamaps:spamaps.ems.host | That's perfect, thanks! | 16:45 |
@spamaps:spamaps.ems.host | I'd probably tweak things a little bit to use CloudSQL.. was there some reason you didn't use CloudSQL? | 16:46 |
@jim:acmegating.com | yeah, that's the weak point. i probably didn't know about cloudsql :) | 16:46 |
@spamaps:spamaps.ems.host | It was pretty downlplayed by gcloud until recently. | 16:47 |
@spamaps:spamaps.ems.host | They *really* want you to use BigTable. :) | 16:47 |
@jim:acmegating.com | (or maybe it wasn't enabled for that project or something, i don't remember) | 16:47 |
-@gerrit:opendev.org- Tobias Henkel proposed: [zuul/zuul] Index build_id in artifact table https://review.opendev.org/c/zuul/zuul/+/758579 | 17:11 | |
-@gerrit:opendev.org- Jeremy Stanley proposed: | 17:47 | |
- [zuul/zuul-jobs] Support verbose showconfig in tox siblings https://review.opendev.org/c/zuul/zuul-jobs/+/806621 | ||
- [zuul/zuul-jobs] Include tox_extra_args in tox siblings tasks https://review.opendev.org/c/zuul/zuul-jobs/+/806612 | ||
- [zuul/zuul-jobs] Explicit tox_extra_args in zuul-jobs-test-tox https://review.opendev.org/c/zuul/zuul-jobs/+/809456 | ||
- [zuul/zuul-jobs] Pin to funcparserlib prerelease for new SetupTools https://review.opendev.org/c/zuul/zuul-jobs/+/809885 | ||
@fungicide:matrix.org | the stack at topic:tox-role is growing a number of bitrot fixes now | 17:48 |
@fungicide:matrix.org | i guess just by virtue of exercising a lot of jobs which are being hit by recent changes in the transitive dependency set | 17:50 |
@tobias.henkel:matrix.org | spamaps: do you have access to privileged scc? | 17:59 |
@fungicide:matrix.org | okay, so we have a dilemma with the funcparserlib workaround (809885) in that it's insufficient for the zuul-jobs-test-fetch-sphinx-tarball-ubuntu-xenial job because the prerelease which works with latest setuptools declares that it doesn't support python 3.5 | 18:33 |
@fungicide:matrix.org | the simplest solution is to drop the zuul-jobs-test-fetch-sphinx-tarball-ubuntu-xenial job, though alternatively we could try to find a way to test that role without installing the doc/requirements.txt from zuul-jobs | 18:34 |
@fungicide:matrix.org | does anyone have a preference? | 18:35 |
@fungicide:matrix.org | note that reworking the fetch-sphinx-tarball testing doesn't enable us to drop the funcparserlib pin because we'll presumably still need that to be able to build zuul-jobs docs with latest setuptools | 18:36 |
@fungicide:matrix.org | another solution might be to drop the blockdiag and seqdiag charts in https://zuul-ci.org/docs/zuul-jobs/docker-image.html | 18:51 |
@fungicide:matrix.org | replace them with pre-rendered images or use a different sphinx plugin | 18:52 |
@fungicide:matrix.org | anyway, i'll propose no longer testing fetch-sphinx-tarball on xenial as a straw man | 18:52 |
@clarkb:matrix.org | I'm good with dropping xenial. Zuul doesn't run on python 3.5 either | 18:54 |
@fungicide:matrix.org | oh, except we auto-generate that mapping, so i'm not sure it's easy to just not test that role on that one particular platform without dropping the platform across all auto-generated jobs? | 18:56 |
@fungicide:matrix.org | we do have a mechanism to switch it to non-voting, but the job is going to be permanently broken so there's no point in even running it | 18:57 |
@fungicide:matrix.org | i guess i could extend tools/update-test-platforms.py to add an exclusion list implemented similarly to the non-voting list but this is quickly becoming no longer the trivial solution i thought it might be | 18:58 |
@fungicide:matrix.org | testing on xenial is also going to get a lot worse soon, as pip has been warning for a while that it's now just a few months away from no longer supporting python 3.5, so we'll need to switch to not using recent toolchains on that platform pretty soon if we want to keep it around | 19:01 |
@fungicide:matrix.org | which leads me to think that this may be our cue to drop xenial testing | 19:03 |
@fungicide:matrix.org | i'm also coming to realize that this is not a topic people are going to want to discuss late on a summer friday, so i'll leave myself a reminder to bring it up again next week. just be aware that a chunk of zuul-jobs is blocked for new changes until we address these issues | 19:06 |
@fungicide:matrix.org | okay, i've summarized on the ml for now: http://lists.zuul-ci.org/pipermail/zuul-discuss/2021-September/001727.html | 19:36 |
@fungicide:matrix.org | welcome bridgefan! i see you found us | 19:37 |
@jim:acmegating.com | fungi, Clark: i think in general the right thing to do is to drop support in zuul-jobs as we need to. we're not here to support things beyond their natural life :) | 19:46 |
@bridgefan:matrix.org | fungi: thanks - yes just trying to get acquainted with matrix | 19:47 |
@fungicide:matrix.org | corvus: in a general sense i agree, but is that specifically a vote for going ahead and just dropping all xenial testing in zuul-jobs, or for finding a way to drop specific jobs from our autogenerated list when we cease to expect them to work any longer? | 19:47 |
@fungicide:matrix.org | yesterday it was google cloud log uploads on python 2.7, today it's fetch-sphinx-tarball testing on python 3.5, but there will come a time when we need to decide there's little point in testing anything on 2.7 or 3.5 | 19:50 |
@fungicide:matrix.org | so i'm trying to gauge whether we're there yet | 19:50 |
@jim:acmegating.com | fungi: is xenial eol? | 19:51 |
@fungicide:matrix.org | as of a few months ago, yes | 19:51 |
@fungicide:matrix.org | (april i think it was?) | 19:51 |
@jim:acmegating.com | looks like that's "end of standard support" but actual eol in 2024.... | 19:52 |
@fungicide:matrix.org | ahh, yeah, canonical does paid support beyond the end of standard support | 19:53 |
@jim:acmegating.com | anyway, dropping xenial sounds pretty reasonable to me | 19:53 |
@fungicide:matrix.org | thanks, i'll propose that and add it to the stack | 19:53 |
@jim:acmegating.com | Clark: i spent the morning digging into detailed postgres query timing with tobiash, and it looks like the pg query planner in postgres may be too unpredictable for comfort. with my most recent change, it may produce a result in 2ms, or 400ms depending on the project specified and what the query planner thinks is best based on its statistics. | 19:54 |
@jim:acmegating.com | since this is critical path for the scheduler, we may need another approach. | 19:56 |
@jim:acmegating.com | we could do your idea of eventually-consistent local time databases. or we could move it into ZK. or we could make a sql table designed just for this case. | 19:57 |
@jim:acmegating.com | opendev's time database is 30,000 files+directories (or 26,000 if you flatten it out to files only), and about 120MiB. | 19:58 |
@jim:acmegating.com | we currently only have a total of 50,000 znodes, so adding another 30k would be substantial. | 19:58 |
@jim:acmegating.com | if we did the sql route, we could just make a table with one row per job (and index all the selector columns), so it'd basically be a table with 26k rows | 20:00 |
@jim:acmegating.com | i guess an alternative would be to prune historical jobs from the db. i think we should do that, but doing that just for this seems wrong. | 20:02 |
@clarkb:matrix.org | moving to zk seems like it would be slower? | 20:02 |
@jim:acmegating.com | the zk gets are pretty fast; lemme check that | 20:02 |
@clarkb:matrix.org | Is the postgres thing perhaps related to having data in memory? Seems like mysql is subject to that as well (and why you want to tune big memory caches for it?) | 20:05 |
@clarkb:matrix.org | anyway I need lunch but will think this over. Also plan to review that sos change stack I've been working through some more | 20:05 |
@jim:acmegating.com | Clark: as best as i can tell it's based on the statistics that the analize command comes up with | 20:07 |
@jim:acmegating.com | sometimes the query planner decides it should do a heapsort of the table; sometimes it decides it should use the index and walk backwards | 20:07 |
@jim:acmegating.com | a ZK get takes about 3ms | 20:08 |
@jim:acmegating.com | (that time is from opendev production) | 20:08 |
@tobias.henkel:matrix.org | I think since that data is potentially unbounded (especially with lots of branches) I'd vote for the special table in sql | 20:09 |
@jim:acmegating.com | my guess is performance would be about the same for a dedicated sql table (especially since our best case with our 18 million row table is 2ms) | 20:09 |
@tobias.henkel:matrix.org | we have 74M with 21k entries but with a time database that has been cleared 5 days ago | 20:10 |
@jim:acmegating.com | good point | 20:10 |
@tobias.henkel:matrix.org | (our time database is currently not persistent) | 20:10 |
@jim:acmegating.com | i'll do a sql mock up with the opendev production data and get some timings for that | 20:10 |
@tobias.henkel:matrix.org | thanks! | 20:11 |
-@gerrit:opendev.org- Jeremy Stanley proposed: | 20:35 | |
- [zuul/zuul-jobs] Pin to funcparserlib prerelease for new SetupTools https://review.opendev.org/c/zuul/zuul-jobs/+/809885 | ||
- [zuul/zuul-jobs] Add tox_config_file rolevar to tox https://review.opendev.org/c/zuul/zuul-jobs/+/806613 | ||
- [zuul/zuul-jobs] Support verbose showconfig in tox siblings https://review.opendev.org/c/zuul/zuul-jobs/+/806621 | ||
- [zuul/zuul-jobs] Include tox_extra_args in tox siblings tasks https://review.opendev.org/c/zuul/zuul-jobs/+/806612 | ||
- [zuul/zuul-jobs] Explicit tox_extra_args in zuul-jobs-test-tox https://review.opendev.org/c/zuul/zuul-jobs/+/809456 | ||
- [zuul/zuul-jobs] Stop testing playbooks/roles on Ubuntu Xenial https://review.opendev.org/c/zuul/zuul-jobs/+/809899 | ||
@bridgefan:matrix.org | test | 20:41 |
@clarkb:matrix.org | fungi: your stack lgtm | 21:05 |
@jim:acmegating.com | all +3 | 21:10 |
@jim:acmegating.com | i'm currently running a test where i iterate over every build in our db (18M) and either insert or update the row-per-job table i'm proposing. basically, it's doing exactly what our scheduler would have done if we'd had this in place for the past 4 years. it's looking like pg scales linearly with the pair of queries (select+insert or update) taking about 1ms for every 10k rows in the table. so we're at 3ms for the operation now. | 21:15 |
@jim:acmegating.com | i think that should be acceptable | 21:16 |
@clarkb:matrix.org | swest: found an issue in https://review.opendev.org/c/zuul/zuul/+/807102 but the rest of the stack lgtm and I expect it is quite close to merging | 21:47 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Move time database to SQL https://review.opendev.org/c/zuul/zuul/+/808841 | 22:50 | |
@jim:acmegating.com | Clark, tobiash: ^ okay... that was "easy" /s | 22:51 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] Clean up dangling cache data nodes more often https://review.opendev.org/c/zuul/zuul/+/807102 | 22:52 | |
@jim:acmegating.com | Clark: ^ i think we can ninja-fix that ^ | 22:52 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: | 22:53 | |
- [zuul/zuul] Move common change cache related methods to mixin https://review.opendev.org/c/zuul/zuul/+/809632 | ||
- [zuul/zuul] Simplify Zookeeper change cache API https://review.opendev.org/c/zuul/zuul/+/809633 | ||
@clarkb:matrix.org | +2'd | 22:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!