Friday, 2021-09-17

@jim:acmegating.comswest: 809414 is looking pretty good to me00:09
@jim:acmegating.commhu: thanks, that's a lot better :)  i left one more question on the change00:21
@gtema:matrix.org> <@jim:acmegating.com> @foodster:matrix.org: honestly no.  for zuul to work in a gating environment, it needs to be in control of merging, otherwise its testing and operation isn't valid.  if you can't alter the gitlab configuration to work with zuul, or update the zuul gitlab driver to work with your workflow, then i don't think trying to work around that in jobs is the right way to go.04:54
in my setup (without ff) I get zuul working without problems even with multiple changes in parallel. Surely I get merge conflicts, but that is same with github and even gerrit. I would be glad if we can merge squashing into Zuul fast so that also commit messages look reasonable.
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-registry] Fix manifest HEAD response when looking up by label https://review.opendev.org/c/zuul/zuul-registry/+/80952806:35
@iwienand:matrix.orgcorvus: Clark ^ that fixes the issue reported in #opendev manifesting on https://review.opendev.org/c/opendev/system-config/+/809488/.  i could replicate this locally in my test environment, and ^ fixed it06:36
@iwienand:matrix.orgif it looks ok, probably easiest way to confirm is to merge it and recheck on 80948806:37
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] wip: Simplified attribute API for ZKObjects https://review.opendev.org/c/zuul/zuul/+/80953207:29
@westphahl:matrix.orgcorvus: ^ felixedel and I had an idea to simplify the zkobject API a bit so we don't have to write `x.updateAttributes(...)` all over the place07:36
@avass:vassast.orgI'm trying to temporarily set up executors outside of openshift and use routes with SNI to connect them to zookeeper but I can't get that to work. So is that just not supported by kazoo or am I doing something wrong?08:05
@avass:vassast.orgtobiash: any idea ^ ?08:05
@tobias.henkel:matrix.orgwe have cross region zk working, need to check our deployment08:05
@avass:vassast.orgI can connect to zk with `openssl s_client` so if I'm missing something it's probably something in our executor/docker-compose config08:06
@tobias.henkel:matrix.orgavass: we're using a nodeport service for zk, not sni08:06
@avass:vassast.orgtobiash: ok got it, was trying to move away from that08:06
@avass:vassast.orgare executors still using gearman now, or is that completely replaced by zookeeper now?08:09
@tobias.henkel:matrix.orgavass: I think sni needs to be added here: https://github.com/python-zk/kazoo/blob/6337fd6f72b59fb20886f980f2e0d6d41525dc35/kazoo/handlers/utils.py#L24208:13
@avass:vassast.orgtobiash: doesn't look like it should be too hard to create a PR to support that08:16
@tobias.henkel:matrix.orgcould be as easy as adding the server_hostname argument there08:31
@avass:vassast.orgyeah08:39
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] Fix wrong if condition in job request state watch https://review.opendev.org/c/zuul/zuul/+/80962910:34
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] Web UI: make more filters selectable in build, buildset searches https://review.opendev.org/c/zuul/zuul/+/79315910:57
-@gerrit:opendev.org- Simon Westphahl proposed:11:06
- [zuul/zuul] Move common change cache related methods to mixin https://review.opendev.org/c/zuul/zuul/+/809632
- [zuul/zuul] Simplify Zookeeper change cache API https://review.opendev.org/c/zuul/zuul/+/809633
@westphahl:matrix.orgClark: re your comments on 806556 I pushed 809632 + 809633 as a follow up11:13
-@gerrit:opendev.org- Felix Edel proposed:12:54
- [zuul/zuul] Don't use executor.builds when processing build result events https://review.opendev.org/c/zuul/zuul/+/808091
- [zuul/zuul] Fix race in test_data_return_child_from_retried_paused_job https://review.opendev.org/c/zuul/zuul/+/808918
- [zuul/zuul] Don't use executor.builds to find out if tests are settled https://review.opendev.org/c/zuul/zuul/+/808792
- [zuul/zuul] Remove the local builds list from the executor client https://review.opendev.org/c/zuul/zuul/+/809175
-@gerrit:opendev.org- Simon Westphahl proposed:13:53
- [zuul/zuul] Move common change cache related methods to mixin https://review.opendev.org/c/zuul/zuul/+/809632
- [zuul/zuul] Simplify Zookeeper change cache API https://review.opendev.org/c/zuul/zuul/+/809633
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] REST API: Implement nodes filtering https://review.opendev.org/c/zuul/zuul/+/73604214:16
@mhuin:matrix.orghello zuul-maint, this change is a +3 away from merging: https://review.opendev.org/c/zuul/zuul/+/80620114:20
@jim:acmegating.comswest, felixedel: did you see my comment on 809414?14:27
@jim:acmegating.comi'm asking because it looks like the only improvement in 809532 was dealing with 2 updatesAttribute calls that i think can be combined into one.  so activeContext ends up being more verbose...14:28
@jim:acmegating.comi guess there's still the question of enqueue_time in that example -- that would either be 2 updateAttribute calls or a dict.  so maybe in that case activeContext wins?14:30
@jim:acmegating.comswest, felixedel: ok, it lgtm.  :)14:32
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul-registry] Fix manifest HEAD response when looking up by label https://review.opendev.org/c/zuul/zuul-registry/+/80952816:19
@mhuin:matrix.orghello zuul-maint, this change was +2'ed before the rebase, for your consideration: https://review.opendev.org/c/zuul/zuul/+/73604216:25
@spamaps:spamaps.ems.hostIf you were going to deploy Zuul into a kubernetes cluster where you don't have Admin.. just curious.. how might any of you do that? The operator isn't an option for me.16:41
@spamaps:spamaps.ems.host(Specifically a GKE cluster, so.. the full power of GCP is available)16:42
@jim:acmegating.comspamaps: i can halfway help with that: here is a deployment i set up on GKE, without using the operator (it predates it): https://gerrit.googlesource.com/zuul/ops/16:43
@jim:acmegating.comhowever, we do have full access to that16:43
@jim:acmegating.comspamaps: the service account setup is particularly interesting16:44
@spamaps:spamaps.ems.hostI can create a project and have full admin in that project, so it shouldn't be too much of a problem.16:45
@spamaps:spamaps.ems.hostThat's perfect, thanks!16:45
@spamaps:spamaps.ems.hostI'd probably tweak things a little bit to use CloudSQL.. was there some reason you didn't use CloudSQL?16:46
@jim:acmegating.comyeah, that's the weak point.  i probably didn't know about cloudsql :)16:46
@spamaps:spamaps.ems.hostIt was pretty downlplayed by gcloud until recently.16:47
@spamaps:spamaps.ems.hostThey *really* want you to use BigTable. :)16:47
@jim:acmegating.com(or maybe it wasn't enabled for that project or something, i don't remember)16:47
-@gerrit:opendev.org- Tobias Henkel proposed: [zuul/zuul] Index build_id in artifact table https://review.opendev.org/c/zuul/zuul/+/75857917:11
-@gerrit:opendev.org- Jeremy Stanley proposed:17:47
- [zuul/zuul-jobs] Support verbose showconfig in tox siblings https://review.opendev.org/c/zuul/zuul-jobs/+/806621
- [zuul/zuul-jobs] Include tox_extra_args in tox siblings tasks https://review.opendev.org/c/zuul/zuul-jobs/+/806612
- [zuul/zuul-jobs] Explicit tox_extra_args in zuul-jobs-test-tox https://review.opendev.org/c/zuul/zuul-jobs/+/809456
- [zuul/zuul-jobs] Pin to funcparserlib prerelease for new SetupTools https://review.opendev.org/c/zuul/zuul-jobs/+/809885
@fungicide:matrix.orgthe stack at topic:tox-role is growing a number of bitrot fixes now17:48
@fungicide:matrix.orgi guess just by virtue of exercising a lot of jobs which are being hit by recent changes in the transitive dependency set17:50
@tobias.henkel:matrix.orgspamaps: do you have access to privileged scc?17:59
@fungicide:matrix.orgokay, so we have a dilemma with the funcparserlib workaround (809885) in that it's insufficient for the zuul-jobs-test-fetch-sphinx-tarball-ubuntu-xenial job because the prerelease which works with latest setuptools declares that it doesn't support python 3.518:33
@fungicide:matrix.orgthe simplest solution is to drop the zuul-jobs-test-fetch-sphinx-tarball-ubuntu-xenial job, though alternatively we could try to find a way to test that role without installing the doc/requirements.txt from zuul-jobs18:34
@fungicide:matrix.orgdoes anyone have a preference?18:35
@fungicide:matrix.orgnote that reworking the fetch-sphinx-tarball testing doesn't enable us to drop the funcparserlib pin because we'll presumably still need that to be able to build zuul-jobs docs with latest setuptools18:36
@fungicide:matrix.organother solution might be to drop the blockdiag and seqdiag charts in https://zuul-ci.org/docs/zuul-jobs/docker-image.html18:51
@fungicide:matrix.orgreplace them with pre-rendered images or use a different sphinx plugin18:52
@fungicide:matrix.organyway, i'll propose no longer testing fetch-sphinx-tarball on xenial as a straw man18:52
@clarkb:matrix.orgI'm good with dropping xenial. Zuul doesn't run on python 3.5 either18:54
@fungicide:matrix.orgoh, except we auto-generate that mapping, so i'm not sure it's easy to just not test that role on that one particular platform without dropping the platform across all auto-generated jobs?18:56
@fungicide:matrix.orgwe do have a mechanism to switch it to non-voting, but the job is going to be permanently broken so there's no point in even running it18:57
@fungicide:matrix.orgi guess i could extend tools/update-test-platforms.py to add an exclusion list implemented similarly to the non-voting list but this is quickly becoming no longer the trivial solution i thought it might be18:58
@fungicide:matrix.orgtesting on xenial is also going to get a lot worse soon, as pip has been warning for a while that it's now just a few months away from no longer supporting python 3.5, so we'll need to switch to not using recent toolchains on that platform pretty soon if we want to keep it around19:01
@fungicide:matrix.orgwhich leads me to think that this may be our cue to drop xenial testing19:03
@fungicide:matrix.orgi'm also coming to realize that this is not a topic people are going to want to discuss late on a summer friday, so i'll leave myself a reminder to bring it up again next week. just be aware that a chunk of zuul-jobs is blocked for new changes until we address these issues19:06
@fungicide:matrix.orgokay, i've summarized on the ml for now: http://lists.zuul-ci.org/pipermail/zuul-discuss/2021-September/001727.html19:36
@fungicide:matrix.orgwelcome bridgefan! i see you found us19:37
@jim:acmegating.comfungi, Clark: i think in general the right thing to do is to drop support in zuul-jobs as we need to.  we're not here to support things beyond their natural life :)19:46
@bridgefan:matrix.orgfungi: thanks - yes just trying to get acquainted with matrix19:47
@fungicide:matrix.orgcorvus: in a general sense i agree, but is that specifically a vote for going ahead and just dropping all xenial testing in zuul-jobs, or for finding a way to drop specific jobs from our autogenerated list when we cease to expect them to work any longer?19:47
@fungicide:matrix.orgyesterday it was google cloud log uploads on python 2.7, today it's fetch-sphinx-tarball testing on python 3.5, but there will come a time when we need to decide there's little point in testing anything on 2.7 or 3.519:50
@fungicide:matrix.orgso i'm trying to gauge whether we're there yet19:50
@jim:acmegating.comfungi: is xenial eol?19:51
@fungicide:matrix.orgas of a few months ago, yes19:51
@fungicide:matrix.org(april i think it was?)19:51
@jim:acmegating.comlooks like that's "end of standard support" but actual eol in 2024....19:52
@fungicide:matrix.orgahh, yeah, canonical does paid support beyond the end of standard support19:53
@jim:acmegating.comanyway, dropping xenial sounds pretty reasonable to me19:53
@fungicide:matrix.orgthanks, i'll propose that and add it to the stack19:53
@jim:acmegating.comClark: i spent the morning digging into detailed postgres query timing with tobiash, and it looks like the pg query planner in postgres may be too unpredictable for comfort.  with my most recent change, it may produce a result in 2ms, or 400ms depending on the project specified and what the query planner thinks is best based on its statistics.19:54
@jim:acmegating.comsince this is critical path for the scheduler, we may need another approach.19:56
@jim:acmegating.comwe could do your idea of eventually-consistent local time databases.  or we could move it into ZK.  or we could make a sql table designed just for this case.19:57
@jim:acmegating.comopendev's time database is 30,000 files+directories (or 26,000 if you flatten it out to files only), and about 120MiB.19:58
@jim:acmegating.comwe currently only have a total of 50,000 znodes, so adding another 30k would be substantial.19:58
@jim:acmegating.comif we did the sql route, we could just make a table with one row per job (and index all the selector columns), so it'd basically be a table with 26k rows20:00
@jim:acmegating.comi guess an alternative would be to prune historical jobs from the db.  i think we should do that, but doing that just for this seems wrong.20:02
@clarkb:matrix.orgmoving to zk seems like it would be slower?20:02
@jim:acmegating.comthe zk gets are pretty fast; lemme check that20:02
@clarkb:matrix.orgIs the postgres thing perhaps related to having data in memory? Seems like mysql is subject to that as well (and why you want to tune big memory caches for it?)20:05
@clarkb:matrix.organyway I need lunch but will think this over. Also plan to review that sos change stack I've been working through some more20:05
@jim:acmegating.comClark: as best as i can tell it's based on the statistics that the analize command comes up with20:07
@jim:acmegating.comsometimes the query planner decides it should do a heapsort of the table; sometimes it decides it should use the index and walk backwards20:07
@jim:acmegating.coma ZK get takes about 3ms20:08
@jim:acmegating.com(that time is from opendev production)20:08
@tobias.henkel:matrix.orgI think since that data is potentially unbounded (especially with lots of branches) I'd vote for the special table in sql20:09
@jim:acmegating.commy guess is performance would be about the same for a dedicated sql table (especially since our best case with our 18 million row table is 2ms)20:09
@tobias.henkel:matrix.orgwe have 74M with 21k entries but with a time database that has been cleared 5 days ago20:10
@jim:acmegating.comgood point20:10
@tobias.henkel:matrix.org(our time database is currently not persistent)20:10
@jim:acmegating.comi'll do a sql mock up with the opendev production data and get some timings for that20:10
@tobias.henkel:matrix.orgthanks!20:11
-@gerrit:opendev.org- Jeremy Stanley proposed:20:35
- [zuul/zuul-jobs] Pin to funcparserlib prerelease for new SetupTools https://review.opendev.org/c/zuul/zuul-jobs/+/809885
- [zuul/zuul-jobs] Add tox_config_file rolevar to tox https://review.opendev.org/c/zuul/zuul-jobs/+/806613
- [zuul/zuul-jobs] Support verbose showconfig in tox siblings https://review.opendev.org/c/zuul/zuul-jobs/+/806621
- [zuul/zuul-jobs] Include tox_extra_args in tox siblings tasks https://review.opendev.org/c/zuul/zuul-jobs/+/806612
- [zuul/zuul-jobs] Explicit tox_extra_args in zuul-jobs-test-tox https://review.opendev.org/c/zuul/zuul-jobs/+/809456
- [zuul/zuul-jobs] Stop testing playbooks/roles on Ubuntu Xenial https://review.opendev.org/c/zuul/zuul-jobs/+/809899
@bridgefan:matrix.orgtest20:41
@clarkb:matrix.orgfungi: your stack lgtm21:05
@jim:acmegating.comall +321:10
@jim:acmegating.comi'm currently running a test where i iterate over every build in our db (18M) and either insert or update the row-per-job table i'm proposing.  basically, it's doing exactly what our scheduler would have done if we'd had this in place for the past 4 years.  it's looking like pg scales linearly with the pair of queries (select+insert or update) taking about 1ms for every 10k rows in the table.  so we're at 3ms for the operation now.21:15
@jim:acmegating.comi think that should be acceptable21:16
@clarkb:matrix.orgswest: found an issue in https://review.opendev.org/c/zuul/zuul/+/807102 but the rest of the stack lgtm and I expect it is quite close to merging21:47
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Move time database to SQL https://review.opendev.org/c/zuul/zuul/+/80884122:50
@jim:acmegating.comClark, tobiash: ^ okay... that was "easy"  /s22:51
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] Clean up dangling cache data nodes more often https://review.opendev.org/c/zuul/zuul/+/80710222:52
@jim:acmegating.comClark: ^ i think we can ninja-fix that ^22:52
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl:22:53
- [zuul/zuul] Move common change cache related methods to mixin https://review.opendev.org/c/zuul/zuul/+/809632
- [zuul/zuul] Simplify Zookeeper change cache API https://review.opendev.org/c/zuul/zuul/+/809633
@clarkb:matrix.org+2'd22:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!