opendevreview | James E. Blair proposed zuul/zuul master: Use a new NodeRequest object when re-submitting node requests https://review.opendev.org/c/zuul/zuul/+/803407 | 00:42 |
---|---|---|
corvus | clarkb: ^ with a test! | 00:43 |
corvus | thinking more on that -- we're more likely to hit that now that the executors are the ones which actually accept the request -- there's a longer window between fulfillment and acceptance. | 00:44 |
Clark[m] | corvus: I skimmed the fix and it looks good. I'll give it a proper review in the morning | 01:21 |
Clark[m] | I noticed this evening that trying to search builds on mobile doesn't work because the menu that drops down for search closes when I try to select the search text field and type in it | 01:31 |
opendevreview | xinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic https://review.opendev.org/c/zuul/zuul-jobs/+/803413 | 02:34 |
opendevreview | xinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic https://review.opendev.org/c/zuul/zuul-jobs/+/803413 | 03:22 |
opendevreview | xinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic https://review.opendev.org/c/zuul/zuul-jobs/+/803413 | 03:24 |
opendevreview | xinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic https://review.opendev.org/c/zuul/zuul-jobs/+/803413 | 03:45 |
felixedel | corvus: Right, haven't thought about that yesterday. Thanks for squashing the changes. | 05:19 |
*** marios is now known as marios|ruck | 06:06 | |
abitrolly[m] | Kafka is now capable of running without Zookeeper - https://developer.ibm.com/tutorials/kafka-in-kubernetes/ Can Zuul do the same? | 06:39 |
*** jpena|off is now known as jpena | 07:01 | |
*** rpittau|afk is now known as rpittau | 07:13 | |
apevec[m] | so that would require Zuul feature like https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum ? | 07:38 |
avass[m] | abitrolly: it would probably require quite a bit of work to replace Zookeeper with something else for zuul (or add support for other systems like etcd). I don't think anyone is strictly opposed to it but with the current resources the development effort is focused on replacing gearman with zookeeper | 07:45 |
abitrolly[m] | What was rationale for choosing Zookeeper? | 07:48 |
avass[m] | abitrolly: you'll have to ask someone more involved in the process for a good answer. But i remember seeing discussions about that earlier and etcd wasn't as mature when the decision was taken. | 08:00 |
mhuin | abitrolly[m], I think this was discussed in this spec: https://opendev.org/zuul/zuul/src/branch/master/doc/source/reference/developer/specs/scale-out-scheduler.rst | 08:01 |
mhuin | I may be wrong though | 08:02 |
abitrolly[m] | The article just says that Zookeper will be used without any rationale or mentioning alternative fault resistance mechanisms (i.e. Raft). It is unfortunate that such decisions are not getting proper write ups. People could learn a lot from that. | 08:04 |
avass[m] | Yeah i think zookeeper was also chosen for zuul since nodepool was already using it | 08:05 |
swest | abitrolly: as avass said it was a natural choice. IIRC there where some discussions if etcd would be a better option | 08:06 |
abitrolly[m] | I don't understand what is a natural choice. | 08:07 |
abitrolly[m] | People had experience with Zookeeper and no experience with etcd, and choose what they know? | 08:08 |
swest | Zookeeper and etcd serve a similar purpose and Zookeeper was already used for Zuul-Nodepool communication (so yes, existing experience + code) | 08:10 |
swest | that's what I meant with "natural choice" | 08:10 |
swest | abitrolly: btw. I found the discussion on zuul-discuss in case you are interested: http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-December/000659.html | 08:11 |
abitrolly[m] | From this doc on Nodepool - https://docs.opendev.org/opendev/infra-specs/latest/specs/nodepool-zookeeper-workers.html - it looks like there was a need in shared state between daemon and workers that is not in MySQL to avoid giving access to MySQL for workers. Is it possible to write an abstraction for that shared state, so that any storage can be used? Redis, DB, etcd/Zookeeper. | 08:16 |
swest | abitrolly: it's certainly possible but I'm not sure I understand your use case. Do you want to avoid using Zookeeper? | 08:22 |
abitrolly[m] | Simplicity for testing Zuul jobs locally. | 08:24 |
swest | abitrolly: sounds like your are looking for something like the zuul-runner (wip) https://zuul-ci.org/docs/zuul/reference/developer/specs/zuul-runner.html | 08:33 |
apevec[m] | this was 2018. "zk is used in enough things that many people have a vested interest in (like Kafka)" - not anymore | 08:39 |
*** jcapitao is now known as jcapitao_lunch | 11:11 | |
*** dviroel|out is now known as dviroel | 11:16 | |
*** jpena is now known as jpena|lunch | 11:26 | |
*** marios is now known as marios|ruck | 11:27 | |
tibeer[m] | Hello, I wanted to ask if someone has the Zuul-Operator working and could help me getting it running. During my testing I face the issue that at least one zookeeper is unable to join the cluster and the percona cluster is also not coming up correctly preventing the startup. | 12:11 |
fungi | abitrolly[m]: also that specification you found is from early 2016. at the time, zookeeper was the only reasonable choice for a mature and highly-available lock manager service packaged in existing linux distributions (etcd for example was too new and not yet packaged in stable distros, other options were only distributed under proprietary/non-osi-approved licenses, and so on) | 12:25 |
fungi | tibeer[m]: are you using 0.1.0 or the current master branch state as of this weekend when all of corvus's refactor was merged? | 12:29 |
tibeer[m] | fungi: Didn't noticed that. I'm using the current master branch. | 12:30 |
tibeer[m] | I'll try 0.1.0 later, thanks! | 12:30 |
*** jpena|lunch is now known as jpena | 12:32 | |
opendevreview | Sorin Sbârnea proposed zuul/zuul-jobs master: Include podman installation with molecule https://review.opendev.org/c/zuul/zuul-jobs/+/803471 | 12:37 |
fungi | tibeer[m]: well, i was asking because i'd heard there were a number of problems with 0.1.0, hence all the recent work on redoing it. on closer inspection there's still a number of changes not yet merged in the https://review.opendev.org/q/topic:"kopf" topic, so maybe what's merged so far is not complete | 12:37 |
*** jcapitao_lunch is now known as jcapitao | 12:42 | |
tibeer[m] | fungi: I quickly skimmed all change subjects and they seem not to fit the issue(s) I am facing. | 12:43 |
fungi | tibeer[m]: yes, as did i. still, maybe avass[m] or corvus have some idea what's going on there since they seem to have written most of the changes for that implementation | 12:45 |
tibeer[m] | I am happy to help / contribute. We are actively looking into Zuul on Kubernetes. | 12:47 |
avass[m] | It's a bit hard to know why that happens without more information. | 12:48 |
avass[m] | tibeer: I don't have any setup to quickly take a look at it since I'm on vacation. But if you have any logs from the zk/db operators i could probably take a quick look | 12:51 |
opendevreview | Benjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota https://review.opendev.org/c/zuul/nodepool/+/800765 | 12:54 |
tibeer[m] | avass: Very kind from you but please enjoy your vacation. I'd like to push that to a later point in time. I would like to take a lot of notes and contribute that to the documentation, as we are using it for GitHub. So others can profit from some sort of troubleshooting guide as well. | 12:55 |
opendevreview | Benjamin Schanzel proposed zuul/zuul master: statsd: expose current_requests per tenant https://review.opendev.org/c/zuul/zuul/+/788680 | 13:06 |
avass[m] | tibeer: cool thanks! I'm planning on working on it some time after my vacation and if there are still issues then I'll take a look at it :) | 13:06 |
corvus | tibeer: it sounds like the zk cluster may not be sized correctly -- you'll need at least 3 somewhat large nodes to support all 3 zk and pxc nodes. i'd check the k8s status for those and see if it's failing constraints or taints. | 13:42 |
opendevreview | Benjamin Schanzel proposed zuul/zuul master: Add tenant name on NodeRequests for Nodepool https://review.opendev.org/c/zuul/zuul/+/788680 | 14:22 |
opendevreview | Benjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota https://review.opendev.org/c/zuul/nodepool/+/800765 | 14:49 |
clarkb | swest: can you check my comment on https://review.opendev.org/c/zuul/zuul/+/803407 if I'm wrong about that assumption it would be good to clarify | 14:50 |
swest | clarkb: responded to your comment | 14:52 |
clarkb | swest: I see the other direction is the issue | 14:53 |
clarkb | corvus: ^ | 14:53 |
swest | yep | 14:53 |
clarkb | corvus: swest: maybe we need a request.clear() or .reset() method to call and have it modify the internal state to be safe? | 14:59 |
clarkb | oh we do have access to the scheduler in the nodepool class there. We could probably lookup the buildset and modify its requests list too? | 15:00 |
corvus | clarkb: i think we should just make a minimal change to the existing code for now; an upcoming change will allow us to avoid re-using the node request later | 15:00 |
clarkb | corvus: that wfm | 15:01 |
corvus | i'll update in just a bit | 15:01 |
*** jpena is now known as jpena|off | 15:05 | |
opendevreview | Sorin Sbârnea proposed zuul/zuul-jobs master: Include podman installation with molecule https://review.opendev.org/c/zuul/zuul-jobs/+/803471 | 15:10 |
opendevreview | James E. Blair proposed zuul/zuul master: Execute merge jobs via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/802299 | 15:14 |
*** marios|ruck is now known as marios|out | 16:00 | |
opendevreview | Sorin Sbârnea proposed zuul/zuul-jobs master: Include podman installation with molecule https://review.opendev.org/c/zuul/zuul-jobs/+/803471 | 16:11 |
*** rpittau is now known as rpittau|afk | 16:14 | |
clarkb | felixedel: corvus: I left a number of comments on https://review.opendev.org/c/zuul/zuul/+/787686 I'm not sure if any are worth a -1 but wanted to make sure you all saw those | 16:46 |
corvus | clarkb: thx; will look now -- i'm doing a final test run on the updated fix, should have a patch in a min | 16:47 |
clarkb | cool I'll plan to review that next then. I should context switch to gerrit account retirements after that though as I said I would get that done today | 16:47 |
opendevreview | James E. Blair proposed zuul/zuul master: Clear nodeset when re-submitting node requests https://review.opendev.org/c/zuul/zuul/+/803407 | 17:12 |
corvus | clarkb: i think that should be ready now | 17:12 |
clarkb | reviewing | 17:14 |
clarkb | corvus: how does copying the nodeset fix this? is that because nodeset.copy() is only copying the logical nodeset info and not the fulfilled nodeset info? | 17:21 |
corvus | clarkb: yes. but also, i think some node state is in zkdata too | 17:22 |
corvus | * clarkb: yes. but also, i think some node state is in `_zk_data` too | 17:22 |
clarkb | corvus: ok, looking at copy() and addNode() I think it may just produce the same data again which makes me suspect _zk_data more. But if this is tested then good enough for me /me reviews the test | 17:23 |
corvus | clarkb: yes, i think _zk_data is the most important, but i think it's also true that the Node objects in the nodeset do have ip addresses, etc, and they will not after the copy | 17:25 |
corvus | (the copy creates new node objects with only names and labels) | 17:25 |
clarkb | got it | 17:26 |
clarkb | lgtm +2 | 17:29 |
clarkb | swest: ^ if your day hasn't ended you amy want to rereview that | 17:29 |
corvus | clarkb: replied to your comments on merger. i think there are some -1s in there. for some of the improvements -- the merger and executor apis are nearly mirror images now. i'd like to keep them as close as possible, and now that we know what each of them needs, i'd like to follow this stack up with a refactor to try to merge them. it's not trivial -- each has something the other doesn't, but i think it's possible with a bit of work, and should | 17:30 |
corvus | make maintenance much easier. | 17:30 |
corvus | (differences: the executor has zones; the merger returns data; neither is true for the other) | 17:31 |
corvus | zuul-maint: https://review.opendev.org/803407 is a production bugfix which i'd love to merge and restart opendev with today | 17:32 |
fungi | oh, right, that's where the marathon troubleshooting ended up last night. reviewing | 17:34 |
clarkb | corvus: merge api change comments responded to. I'm ok with cleaning up those bigger things in refactoring changes. But a few of those like the removal of dead code and unused conditions as well as handling param deletion etc should probably get cleaned up in this change to make it easier to understand going forward | 17:36 |
corvus | clarkb: yep, will do next. | 17:37 |
*** ricolin_ is now known as ricolin | 18:02 | |
opendevreview | James E. Blair proposed zuul/zuul master: [web] Update the tenantsPage to drop redux https://review.opendev.org/c/zuul/zuul/+/803504 | 18:21 |
fungi | corvus: 803407,2 has failed its tox unit test jobs, though looks like the buildset isn't likely to report (probably courtesy of the bug it aims to fix? i haven't dug into it) | 18:35 |
fungi | "testtools.matchers._impl.MismatchError: 'unknown' != 'ready'" on test_accept_nodes_lost_request and test_accept_nodes_lost_request | 18:38 |
fungi | er, on TestNodepoolResubmit.test_accept_nodes_lost_request and TestNodepool.test_accept_nodes_lost_request | 18:38 |
clarkb | hrm we didn't modify TestNodepool just inherited from it. If we were going to have errors I would've expected it in the child only | 18:42 |
fungi | nice though that we have proper build result pages linked from those even if the buildset never reports! | 18:44 |
fungi | that feature is already paying dividends | 18:44 |
corvus | oh, i overrode a method, i think i just need to refactor that out into a base class; i didn't want the test methods in super to run | 19:15 |
corvus | and the change requires an update to that test. | 19:19 |
opendevreview | Tristan Cacqueray proposed zuul/zuul master: [web] Update the TenantsPage to use the context hook https://review.opendev.org/c/zuul/zuul/+/803509 | 19:20 |
opendevreview | James E. Blair proposed zuul/zuul master: Clear nodeset when re-submitting node requests https://review.opendev.org/c/zuul/zuul/+/803407 | 19:20 |
*** y2kenny is now known as Guest3323 | 20:05 | |
*** Guest3323 is now known as y2kenny | 20:06 | |
clarkb | corvus: in ^ why does the state of the nodes chagne from ready to unknown in that test? | 20:06 |
y2kenny | For smtp/email reporter, is the email address configured per pipeline or can the email address be configured at the job level? | 20:07 |
clarkb | y2kenny: since reporters are done per pipeline it is only per pipeline iirc | 20:08 |
clarkb | you could have a job send email itself (openstack does this for some release jobs) | 20:08 |
y2kenny | clark: ah ok. THanks. | 20:11 |
corvus | clarkb: because the nodeset in the test is the nodeset on the build request, and now we're resetting it. that test ends with an unfulfilled node request, so 'unknown' is legit. previously it ended with an unfulfilled request which had nodes, and those nodes were ready. i think the important thing is to assert that they aren't locked, and aren't used or somethign. | 20:12 |
clarkb | I see, its a side effect of our new reset | 20:14 |
clarkb | corvus: I've +2'd the change but not approved it since i Think the unittests are close to finishing and getting that info if they fail seems like a good idea | 20:20 |
corvus | clarkb: sgtm; i haven't re-run the full suite locally so i agree | 20:20 |
*** dviroel is now known as dviroel|out | 20:38 | |
corvus | one of the test suites finished, i'm going to go ahead and +w it | 20:40 |
fungi | awesome, i'll be around for a restart in an hour or so, just putting dinner together now | 20:49 |
*** timburke_ is now known as timburke | 20:57 | |
opendevreview | James E. Blair proposed zuul/zuul master: Add MergerApi https://review.opendev.org/c/zuul/zuul/+/787686 | 21:19 |
opendevreview | James E. Blair proposed zuul/zuul master: Fix test_gerrit.TestPolling.test_config_update https://review.opendev.org/c/zuul/zuul/+/773023 | 21:20 |
opendevreview | James E. Blair proposed zuul/zuul master: Execute merge jobs via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/802299 | 21:20 |
corvus | clarkb: good suggestions :) | 21:20 |
clarkb | corvus: +2'd the first one. The relationship between read, wait and delete in the event result futures is fun to unravel | 21:31 |
clarkb | https://review.opendev.org/c/zuul/zuul/+/802299/ is next and not a small change :) I'll probably hold off on digging into that until after we restart zuul to fix the ongoing issue and then I need to find a nice spot in the shade :) | 21:36 |
corvus | clarkb: sounds good. hopefully the split between "add the api" and "use the api" makes sense. no rush on those. | 21:46 |
corvus | my plan is to help with restart when ready and otherwise hack on the refactor | 21:46 |
fungi | good, dinner done and it hasn't merged yet | 21:48 |
fungi | i'm not too late | 21:48 |
corvus | it timed out :( | 22:33 |
clarkb | :( | 22:39 |
opendevreview | James E. Blair proposed zuul/zuul master: WIP: Refactor Merger/Executor API https://review.opendev.org/c/zuul/zuul/+/803529 | 23:43 |
corvus | that's just a checkpoint -- it's not done yet; i still have a lot of test methods to update. however, the basic executor and merger tests both work with that, so i think the approach is viable. | 23:44 |
corvus | so far it's a net -292 line change | 23:45 |
opendevreview | Merged zuul/zuul master: Clear nodeset when re-submitting node requests https://review.opendev.org/c/zuul/zuul/+/803407 | 23:47 |
clarkb | woot it merged | 23:47 |
clarkb | now we wait for image promotion | 23:48 |
clarkb | https://zuul.opendev.org/t/zuul/build/d281dbe097124a3899ba25bfb8113355 that job succeeded | 23:49 |
clarkb | corvus: ^ should we proceed wtih a restart? I'm not sure what else has merged since the last restart and how risk it is due to the other changes | 23:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!