*** mattw4 has quit IRC | 00:03 | |
ianw | are we getting "waiting on logger" a lot more in the streaming now? | 00:29 |
---|---|---|
fungi | it has seemed like it, but i don't have enough long-term memory left to remember clearly a time when we weren't | 00:33 |
clarkb | ya I think we always got some of it, though it may be mroe of it now. Basically its waiting for the remote test node side daemon to start listening on port 19885 | 00:35 |
clarkb | possible that newer ansible doesn't trigger the bit that starts that until later now? (could be side effect of ansible 2.7 -> 2,8 default change?) | 00:35 |
fungi | any idea if a full-reconfigure is needed to get the scheduler to stop applying implied branch matchers after deleting a branch from gerrit? | 00:37 |
clarkb | fungi: I thought our mergers handled that on git repo update, which may mean you need a new commit to land. If a full reconfigure implies a git repo update I think that would work too | 00:42 |
clarkb | and I think full reconfigure does imply that beacuse its what is done at startup and zuul wants to make sure it is up to date when doing that | 00:43 |
fungi | clarkb: yep, on a whim i already gave that a shot, and it seemed to take care of it | 00:45 |
fungi | two uses for zuul-scheduler reconfigure in one day! | 00:45 |
fungi | er, zuul-scheduler full-reconfigure | 00:46 |
fungi | two different utc days though, technically | 00:46 |
*** sgw has quit IRC | 00:50 | |
*** zenkuro13 has quit IRC | 01:36 | |
*** dangtrinhnt has joined #zuul | 01:38 | |
*** michael-beaver has quit IRC | 01:38 | |
dangtrinhnt | Hi, I have a couple of sessions at my company presenting about Zuul and people are so excited to use it. | 01:40 |
dangtrinhnt | I will have a chance to do a pilot at my company over a couple of months. My plan is to replace our Jenkins with Zuul. | 01:40 |
clarkb | dangtrinhnt: exciting. Let us know how it goes and if wecan help | 01:45 |
dangtrinhnt | Sure. Thanks. :D | 01:45 |
fungi | that's awesome | 01:45 |
fungi | yeah, do let us know if you have any questions | 01:45 |
dangtrinhnt | thanks. | 01:46 |
*** rlandy has quit IRC | 02:42 | |
*** rlandy has joined #zuul | 03:21 | |
*** rlandy has quit IRC | 03:22 | |
*** dangtrinhnt has quit IRC | 03:55 | |
*** dangtrinhnt has joined #zuul | 03:56 | |
*** dangtrinhnt has quit IRC | 04:01 | |
*** dangtrinhnt has joined #zuul | 04:07 | |
*** raukadah is now known as chandankumar | 05:09 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #zuul | 05:35 | |
openstackgerrit | Merged zuul/zuul-jobs master: install-javascript-packages: add tox_constraints_file https://review.opendev.org/709414 | 05:40 |
*** swest has quit IRC | 06:17 | |
SpamapS | mordred:thanks, I added a question | 06:20 |
*** dangtrinhnt has quit IRC | 06:24 | |
*** dangtrinhnt has joined #zuul | 06:26 | |
*** saneax has joined #zuul | 06:33 | |
*** swest has joined #zuul | 06:42 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Optimize canMerge using graphql https://review.opendev.org/709836 | 07:44 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 07:44 |
*** avass has joined #zuul | 07:51 | |
*** harrymichal has joined #zuul | 08:00 | |
*** bolg has joined #zuul | 08:01 | |
*** avass has quit IRC | 08:09 | |
*** jcapitao has joined #zuul | 08:11 | |
bolg | mordred: also + for dropping 3.5 :) | 08:14 |
*** Defolos has joined #zuul | 08:18 | |
*** tosky has joined #zuul | 08:20 | |
*** jpena|off is now known as jpena | 08:46 | |
* mnaser is actually a fan of old-style depends-on :( | 09:04 | |
mnaser | it is extremely useful when doing cherry-picks to different branches | 09:05 |
mnaser | because i dont have to modify the messages on all of them, it just naturally works | 09:05 |
*** lennyb has quit IRC | 09:06 | |
*** mhu has quit IRC | 09:07 | |
*** lennyb has joined #zuul | 09:07 | |
*** dangtrinhnt has quit IRC | 09:14 | |
*** mhu has joined #zuul | 09:14 | |
*** dangtrinhnt has joined #zuul | 09:16 | |
*** dangtrinhnt has quit IRC | 09:17 | |
*** dangtrinhnt has joined #zuul | 09:17 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Refactor `self.event_queues` in tests https://review.opendev.org/709990 | 09:26 |
*** jfoufas1 has joined #zuul | 09:32 | |
openstackgerrit | Felix Edel proposed zuul/zuul master: Dequeue items via buildset uuid https://review.opendev.org/709135 | 10:05 |
*** harrymichal has quit IRC | 10:09 | |
*** harrymichal_ has joined #zuul | 10:10 | |
*** harrymichal_ is now known as harrymichal | 10:10 | |
*** felixedel has joined #zuul | 10:18 | |
*** dangtrinhnt has quit IRC | 10:18 | |
*** dangtrinhnt has joined #zuul | 10:19 | |
felixedel | corvus: I've updated https://review.opendev.org/#/c/633501/23 to follow up on your latest comments. We were indeed focusing on two different API endpoints :D So, now the buildset endpoint returns a JSON that's equivalent to the one included in the MQTT report. I will add a follow up change to improve the UI part showing also the retried builds on the buildset page. | 10:20 |
*** harrymichal has quit IRC | 10:22 | |
*** dangtrinhnt has quit IRC | 10:24 | |
*** ianychoi_ is now known as ianychoi | 11:32 | |
*** jfoufas1 has quit IRC | 12:02 | |
*** dangtrinhnt has joined #zuul | 12:06 | |
*** sshnaidm is now known as sshnaidm|bbl | 12:09 | |
*** jcapitao is now known as jcapitao_lunch | 12:10 | |
*** avass has joined #zuul | 12:20 | |
*** jpena is now known as jpena|lunch | 12:34 | |
*** dangtrinhnt has quit IRC | 12:38 | |
*** Goneri has joined #zuul | 12:38 | |
*** dangtrinhnt has joined #zuul | 12:40 | |
*** harrymichal has joined #zuul | 12:42 | |
*** dangtrinhnt has quit IRC | 12:50 | |
*** dangtrinhnt has joined #zuul | 12:52 | |
*** dangtrinhnt has quit IRC | 12:55 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality https://review.opendev.org/709735 | 12:59 |
*** harrymichal has quit IRC | 13:00 | |
*** rlandy has joined #zuul | 13:00 | |
*** harrymichal has joined #zuul | 13:00 | |
*** jcapitao_lunch is now known as jcapitao | 13:17 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality https://review.opendev.org/709735 | 13:22 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: WIP: Store unparsed branch config in Zookeeper https://review.opendev.org/705716 | 13:25 |
*** jpena|lunch is now known as jpena | 13:26 | |
*** avass has quit IRC | 13:26 | |
*** Goneri has quit IRC | 13:28 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Refactor github auth handling into its own class https://review.opendev.org/710034 | 13:35 |
*** Goneri has joined #zuul | 13:42 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Tests ensure-tox on all-platforms https://review.opendev.org/708642 | 13:53 |
*** jamesmcarthur has joined #zuul | 13:53 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: WIP: Store unparsed branch config in Zookeeper https://review.opendev.org/705716 | 13:54 |
*** harrymichal has quit IRC | 13:57 | |
*** harrymichal has joined #zuul | 13:59 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Tests bindep role on all-platforms https://review.opendev.org/708704 | 14:02 |
fungi | mnaser: i don't foresee it being necessary to drop the old-style dependency syntax, it's mainly deprecated because it's vague in a multi-connection scenario and so discouraged over being more specific about the system in which the change resides | 14:04 |
fungi | also not all gerrit deployments/projects require a change-id | 14:05 |
fungi | opendev's gerrit deployment has standardized on/recommended it because we think it's useful | 14:06 |
*** harrymichal has quit IRC | 14:24 | |
*** jamesmcarthur has quit IRC | 14:27 | |
*** jamesmcarthur has joined #zuul | 14:30 | |
*** sgw has joined #zuul | 14:32 | |
*** jamesmcarthur has quit IRC | 14:32 | |
mnaser | fungi: I didn’t know gerrit use was possible without change ID! I thought that’s how it recognized revisions | 14:53 |
fungi | that's one way, another is pushing to a specific remote ref corresponding to the change | 14:55 |
fungi | git-review relies on gerrit change-id due to how it's set up | 14:55 |
*** sshnaidm|bbl is now known as sshnaidm | 14:56 | |
fungi | and i think it's a more intuitive solution, even though it does rely on having a commit hook to append a random id into commit messages | 14:56 |
corvus | depends-on: change-id is deprecated and planned for removal; see "An older syntax" in https://zuul-ci.org/docs/zuul/discussion/gating.html | 14:57 |
fungi | when we do remove it, i'm assuming it'll be because of the maintenance burden for the additional code paths | 14:58 |
corvus | fungi: that (it is a tortuous code path) and also to promote parity and a unified experience between drivers | 14:59 |
fungi | and i think we haven't rushed to remove it because doing so would silently stop applying dependencies for existing changes which haven't switched to the new syntax | 15:01 |
fungi | though maybe we could start returning errors if a malformed depends-on footer is found | 15:01 |
corvus | mostly as a courtesy to the folks doing third-party ci on opendev that are still running zuul v2 | 15:01 |
corvus | yeah, warnings are doable | 15:02 |
fungi | warnings too i suppose, but i was thinking in terms of when support is eventually removed, we could opt to have zuul refuse to run jobs instead of silently ignoring the depends-on | 15:04 |
fungi | especially because folks may be relying on dependencies to block changes from merging | 15:04 |
fungi | we've already seen that when opendev switched the canonical hostname for its gerrit in zuul | 15:04 |
fungi | some changes with depends-on to changes using the old hostname were no longer blocked from merging | 15:05 |
fungi | and occasionally got merged prematurely as a result | 15:05 |
*** swest has quit IRC | 15:06 | |
*** saneax has quit IRC | 15:07 | |
mnaser | speaking of canonical url, one of the things I’ve struggled with is that our gerrit connection driver uses an internal gerrit URL (gerrit.gerrit) as it’s hosted inside kubernetes. | 15:15 |
mnaser | Even though we have our canonical hostname set to something else, the web UI shows the gerrit.gerrit path | 15:16 |
mnaser | I think maybe a driver option to include “public facing URL” or maybe gerrit exposes that somewhere and we can use that in the driver instead | 15:16 |
corvus | mnaser: where? (that's a bug, and since we do the same thing in opendev, it should be easy for us to see) | 15:16 |
mnaser | corvus: for example, on the “builds” page in the web UI, when you click on a review ID there | 15:17 |
mnaser | For a specific build | 15:17 |
mnaser | I assume it’s likely similar in the status page too | 15:17 |
mordred | mnaser: oh - because your zuul is talking to the internal url of the gerrit? | 15:18 |
mnaser | mordred: yep | 15:18 |
corvus | mnaser: but there's also a public url for gerrit? | 15:18 |
mnaser | yep, there is a public URL but I’d like zuul to use the internal one to avoid doing a big loop | 15:19 |
mnaser | So I have my user public facing URL, and the internal one that lives within the cluster network (overlay in this case) | 15:19 |
fungi | so you're using different hostnames to work around different paths through the network topology | 15:19 |
*** mugsie has quit IRC | 15:20 | |
mnaser | I don’t think using canonical hostname is the right option for this either cause that would break me (and OpenDEv too) | 15:20 |
mnaser | fungi: correct | 15:20 |
corvus | yes, it is not currently anticipated that zuul and users would use a different hostname to access the same gerrit | 15:20 |
fungi | the classical solution for that is split-horizon dns, but that is likely overkill for this one case alone | 15:20 |
fungi | if you had a lot of instances of this sort of problem then solving it at the dns server level would make more sense | 15:21 |
corvus | (canonical hostname is for when the gerrit url is not the url where users typically clone the authoritative upstream repo from) | 15:21 |
mnaser | If that’s something that sounds like it makes sense for zuul and we can come up with an option name I can push up a patch | 15:21 |
*** mugsie has joined #zuul | 15:22 | |
* mnaser about to take off, will be around in a short 45 minutes hopefully :) | 15:23 | |
fungi | another popular solution is to solve it at the routing level, publishing separate routes for that address space to the machines which shouldn't use the global routes, and possibly also performing 1:1 address translation if those addresses aren't bound directly to the servers themselves | 15:23 |
corvus | mnaser: i'm game, but it'll need to hit all the drivers | 15:27 |
mordred | fungi: yeah - but in this case I think the internal location is just the standard internal k8s service name and the external is the external dns name attached to the published external ip - so to do that would involve updating k8s | 15:27 |
mordred | corvus: yah. because one could imagine someone doing a private github + zuul in the same model | 15:27 |
corvus | yup | 15:27 |
fungi | mordred: seems like the routing solution should be doable in network namespaces for the individual containers, but i don't know a whole lot about how the kernel routes between namespaces | 15:28 |
mordred | fungi: I think the issue might be separation of ownership in this case - k8s owns the internal ip and name but likely not the external dns name - so there might not be anyone (other than mnaser as a human) who has complete enough context to do the split routing | 15:29 |
mordred | fungi: k8s runs an internal dns service that it uses to assign logically addressable names to pods, so that intra-k8s communication can be done via logical names leveraging dns | 15:30 |
corvus | i think a notable aspect of this is that the public ip is potentially entirely outside of k8s (ie, an external load balancer); though if the load balancer is configured by k8s (via a custom ingress controller), then k8s can know the ip, so could theoretically create an internal route for it | 15:30 |
mordred | corvus: yah | 15:31 |
corvus | mordred: but routing is by ip, not name | 15:31 |
corvus | and k8s might know the ip | 15:31 |
mordred | that's true | 15:31 |
fungi | well, i was thinking in terms of destination route in the container namespaces for the public address, and layer-3 translation to map that address to the actual internal one | 15:31 |
fungi | so that the containers themselves when communicating with the "public" addresses of other services just use local routes | 15:32 |
fungi | rather than hitting the "load balancer" (whatever it may be) | 15:32 |
corvus | fungi: right, i think that's theoretically possible if k8s knows that the public ip == an internal service | 15:33 |
fungi | but if the desire is to also have kubernetes set up and maintain all the local routing in your container network namespaces (rather than injecting the routes and translation rules into the network namespaces yourself through other means) then that likely depends on whether it supports such notions | 15:33 |
fungi | it's one of the reasons i struggle to appreciate kubernetes, it seems lots of people treat it as a black box and anything it can't do itself is simply deemed not possible | 15:34 |
corvus | fungi, mordred: fwiw, i just did a traceroute from zuul-scheduler on gerrit's zuul to ci.gerritcodereview.com and got 9 hops terminating at the public ip, and did the same to zuul-web.zuul (the internal service name) and got 2 hops. so based on my *extensive* testing, it seems like we do not get a pony. | 15:35 |
fungi | these are long-standing solutions to these sorts of problems which predate kubernetes by decades, and just because they're not kubernetes features doesn't mean they can't still be used alongside it | 15:36 |
mordred | corvus: darn. I wanted a pony | 15:36 |
mordred | fungi: maybe another way to look at this one is that it's still possible this could be implemented by k8s - but just because k8s might implement it doesn't mean every k8s admin will have deployed it that way - so from a zuul perspective we should still probably support the split to be friendly to zuul admins regardless of the shape of their k8s | 15:37 |
mordred | all openstack services support public and internal endpoints in the keystone catalog for the same use case - fwiw | 15:38 |
fungi | i'm not opposed to the suggested feature, simply pointing out that there are already ways to accomplish the same goal through other means (name resolution, routing, et cetera) | 15:39 |
corvus | right, but since k8s controls both the name resolution and routing for the containers inside k8s, then unless/until k8s supports doing those things the options effectively aren't available | 15:40 |
fungi | i'm not a big fan of address translation to start with, but when address translation is already forced on you (it seems like kubernetes insists on it?) then fixing it with more address translation seems reasonable | 15:40 |
corvus | (and who knows? maybe it does? it's... complex... to put it mildly) | 15:41 |
mordred | fungi: you *can* actually set up a k8s with no NAT | 15:42 |
mordred | fungi: almost nobody does of course | 15:42 |
fungi | kubernetes doesn't exclusively control routing and name resolution for the containers though, as i understand it, you can still influence them from the host by manipulating the namespace routing tables and controlling whatever resolvers requests are forwarded to | 15:42 |
mordred | because the world has been somehow brainwashed into believing that NAT isa . good thing | 15:42 |
corvus | (pnat is evil; nat is not bad) | 15:42 |
corvus | fungi: right, we could probably edit /etc/hosts or something. but that seems a bit heavy handed, and may not be a realistic option in some deployments. | 15:43 |
fungi | nat as a means of packet filtering is pretty bad though, and something we can blame cisco's purchase of pix and subsequent popularization for | 15:44 |
corvus | anyway, mnaser's patch will not be a one-liner, but it will be simpler than "deploy a new dns system" :) | 15:45 |
fungi | corvus: i was thinking split-horizon dns in the recursive resolver actually, but i know folks are often not a fan of that either | 15:45 |
corvus | er, i guess "deploy a new dns". i have to go to the atm machine. | 15:45 |
mordred | corvus: you must of done good on the SAT test | 15:45 |
Shrews | mordred: or just SAT | 15:46 |
fungi | i had a scholastic aptitude for the sat test | 15:47 |
corvus | i did sit for the sat | 15:48 |
corvus | so if i can change the topic for a moment; i'm looking into removing the "requires/provides: docker-image" we have in our opendev base image jobs and replacing them with more specific attributes on the leaf-node jobs like "provides/requires: python-base-image" so that zuul can be smarter about serializing them (right now, we've reduced zuul's gate queue to a serialized window of 1 because it thinks every | 15:50 |
corvus | change depends on the container images of the change ahead). | 15:50 |
corvus | that seems pretty straightforward up to this point: the "requires:" is not on the build job, but rather, on the buildset-registry job, because the buildset registry role pulls the previous artifacts in when it starts. | 15:51 |
corvus | that means that i'd need to put "requires" lines on either the leaf-node job or the buildset registry job, depending on whether the repo is using it (ie, zuul does not have a separate buildset registry job, so i would put requires on "zuul-quick-start", but nodepool does, so i would need to duplicate them there) | 15:52 |
*** bolg has quit IRC | 15:52 | |
corvus | that sounds messy, and prone to error if someone were to switch from no buildset-registry job to using one | 15:52 |
corvus | i was thinking maybe i could have zuul "pull-up" the requirements of any dependent jobs (so buildset-registry gets the requirements of zuul-quick-start automatically in nodepool). but i worry that could go too far (for instance, if you made a tree of jobs that started with a linter, then ran buildset-registry if linting passed. you'd have the linter waiting for container builds of changes ahead) | 15:53 |
corvus | maybe it would be okay to move the role that populates the buildset registry into the base image building job instead of the buildset-registry job? | 15:54 |
corvus | you could end up with some jobs which use the buildset-registry starting before it was completely populated with images from previous builds, but that should be okay, because presumably, those jobs don't "require" the images that haven't been imported yet | 15:55 |
corvus | i think i'll look into that, unless someone has another idea (or can see why that wouldn't work) | 15:56 |
mordred | corvus: wow, that's a second coffee question ... | 15:56 |
mordred | corvus: but on first read it seems reasonable | 15:56 |
mordred | and I support the goal state :) | 15:56 |
corvus | yeah, i mean, i should have breakfast too. :) | 15:56 |
mordred | corvus: you're pre-breakfast? I'm violating one of my cardinal rules - never get between corvus and breakfast | 15:57 |
mordred | (never start a land war in asia is still a more important rule) | 15:58 |
Shrews | that's more of a 4th coffee or 1st coffee-flavored beer question | 16:00 |
tobiash | zuul-maint: we're currently facing performance issues with github so I took some time to dive into optimizing the api requests we're doing towards github. This reduces the number of requests needed to enqueue a change from 12 to 6 while four of the saved requests are in the critical code path in the run_handler: https://review.opendev.org/#/q/topic:github-optimization | 16:01 |
fungi | i'm still trying to wrap my head around the premise with the buildset-registry requires question... maybe i should rethink my one-coffee-a-day policy | 16:09 |
tristanC | tobiash: i though github graphql endpoint needs a pageInfo element to get a cursor and be able to query more than 100 results or something. | 16:11 |
tobiash | tristanC: yes, the initial assumption is that nobody has more than 100 protection rules and check runs | 16:12 |
mordred | tobiash: that seems like a reasonable starting place :) | 16:13 |
tobiash | I'll look into (partly generic) paging if needed | 16:13 |
tristanC | tobiash: isn't the query also used to get mergable PR? | 16:13 |
tobiash | tristanC: you mean if the pr can be merged without conflicts? | 16:14 |
tobiash | regarding cursor on check suites/runs: zuul has one checksuite/run per pipeline, so I'm pretty sure a limit of 100 is safe there | 16:15 |
*** felixedel has quit IRC | 16:23 | |
mordred | tobiash: the codeowners patch can't use the graphql too I suppose? | 16:24 |
mordred | tobiash: NEVERMIND | 16:25 |
mordred | tobiash: I see it now | 16:25 |
tobiash | mordred: it does, but I'd like to refactor it to match the structure of the parent | 16:26 |
tobiash | I didn't wip it because it never got reviews ;) | 16:26 |
mordred | tobiash: it's always so scary! | 16:26 |
mordred | tobiash: but yeah, I think updating it to match will be nice | 16:27 |
tobiash | yes I know that this is scary (and only needed due to api limitations) | 16:27 |
mordred | the new graphql patch is actually much more understandable than I thought when I saw the commit come in originally | 16:28 |
tobiash | mordred: that was the goal :) | 16:29 |
tobiash | and if you get used to it, the graphene based mock of the github api is quite simple | 16:29 |
mordred | tobiash: you've reminded me that I have a very out-of-date todo list item to migrate the zuul website to gatsby ... largely because gatsby also uses graphql and doing a website with it is what got me to learn the graphql in the first place | 16:30 |
mordred | so maybe I should actually get around to doing that | 16:30 |
*** mattw4 has joined #zuul | 16:33 | |
tobiash | mordred: oops, I forgot one in the topic: https://review.opendev.org/#/c/709149 | 16:35 |
*** jamesmcarthur has joined #zuul | 16:36 | |
mnaser | some good discussion. I agree that all drivers should support it | 16:39 |
mnaser | I will work in something .. | 16:40 |
mnaser | ok i've made two discoveries | 16:51 |
mnaser | there is a 'baseurl' config option inside the gerrit driver.. | 16:52 |
mnaser | which seems like it would be the actual perfect case to use for this, so i'd *maybe* argue that if it's not being respected, maybe that's an issue, because it seems to be used for things like the gitweb url | 16:53 |
mnaser | and to me it seems like if i add that setting, this solves my issue? | 16:53 |
mnaser | (it defaults to the value of 'server') | 16:54 |
corvus | mnaser: oh i forgot about that, sorry. yeah, that's worth a try :) | 16:54 |
corvus | mnaser: oh, no that's not quite it | 16:55 |
mnaser | if it's not it, i'm struggling a little bit in finding the code that 'generates' the url which ends up going into the database (which seems to be item.change.url) which is what is provided by ref_url inside the API | 16:56 |
corvus | mnaser: baseurl is used for rest api requests. basically, it's the http equivalent of 'server' | 16:56 |
mnaser | ahhh | 16:56 |
corvus | (server for ssh, baseurl for http) | 16:56 |
mnaser | ok, yes makes sense | 16:57 |
corvus | so yeah, i think the main challenge is going to be separating out all of the internal vs external uses | 16:57 |
mnaser | ok yeah i'm starting to see how this can be a little hard | 16:58 |
mnaser | it might involve adding an extra thing to the zuul change model which is item.external_url or whatever it's called | 16:59 |
mnaser | and then we save that in the sql reporter instead of the item.change.url | 16:59 |
corvus | i don't think the model should need to be changed | 16:59 |
mnaser | i guess this also means that unknowingly my depends-on with urls will only work with Depends-On: http://gerrit.gerrit/1234 | 16:59 |
mnaser | i'm trying to think out loud with the least impactful change that doesnt end up touching a lot of different bits | 17:00 |
corvus | all the internal actions should be done via the source, trigger, and reporter interfaces; those are the only thing that needs to know the internal hostname | 17:00 |
corvus | (all of those are aspects of a connection) | 17:01 |
corvus | and yes, i think that's your current depends-on syntax | 17:01 |
*** mattw4 has quit IRC | 17:03 | |
*** mattw4 has joined #zuul | 17:04 | |
openstackgerrit | Merged zuul/zuul master: Don't set untouched refs of the repo state twice. https://review.opendev.org/707857 | 17:05 |
*** mattw4 has quit IRC | 17:34 | |
*** evrardjp has quit IRC | 17:34 | |
*** evrardjp has joined #zuul | 17:35 | |
*** jpena is now known as jpena|off | 17:40 | |
openstackgerrit | Merged zuul/zuul master: Don't fetch pull request twice for status event https://review.opendev.org/709149 | 17:43 |
*** mattw4 has joined #zuul | 17:44 | |
*** jamesmcarthur has quit IRC | 17:58 | |
*** jamesmcarthur has joined #zuul | 18:00 | |
*** jamesmcarthur has quit IRC | 18:05 | |
*** mattw4 has quit IRC | 18:05 | |
*** Goneri has quit IRC | 18:11 | |
*** Goneri has joined #zuul | 18:12 | |
*** jamesmcarthur has joined #zuul | 18:21 | |
*** chandankumar is now known as raukadah | 18:21 | |
*** michael-beaver has joined #zuul | 18:29 | |
*** jcapitao is now known as jcapitao_off | 18:35 | |
*** mattw4 has joined #zuul | 18:36 | |
*** tjgresha has joined #zuul | 18:36 | |
*** igordc has joined #zuul | 18:37 | |
mordred | corvus: reply from SpamapS in https://etherpad.openstack.org/p/zuulv4 | 18:39 |
*** igordc has quit IRC | 18:43 | |
corvus | mordred, SpamapS: i think the warning should come after we merge implicit sql reporters, and that's a semi-breaking change (at least, it's a change that operators *may* need to make carefully). but if we follow the sequence of: tell people that sql reporters will be required, make them implicit, make explicit sql reporters an error, then it should be relatively smooth for most folks? unless i'm missing | 18:46 |
corvus | something | 18:46 |
*** rlandy is now known as rlandy|mtg | 18:46 | |
*** tjgresha has quit IRC | 18:49 | |
*** tjgresha has joined #zuul | 18:52 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Use explicit provides/requires for container jobs https://review.opendev.org/710115 | 18:52 |
*** tosky has quit IRC | 18:56 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Use explicit provides/requires for container jobs https://review.opendev.org/710116 | 18:58 |
*** dpawlik has quit IRC | 19:01 | |
*** felixedel has joined #zuul | 19:05 | |
*** mattw4 has quit IRC | 19:09 | |
*** mattw4 has joined #zuul | 19:17 | |
*** dpawlik has joined #zuul | 19:18 | |
*** rlandy|mtg is now known as rlandy | 19:27 | |
*** jamesmcarthur has quit IRC | 19:31 | |
*** jamesmcarthur has joined #zuul | 19:40 | |
*** jamesmcarthur has quit IRC | 19:58 | |
*** sshnaidm is now known as sshnaidm|afk | 20:11 | |
*** jcapitao_off has quit IRC | 20:17 | |
*** jamesmcarthur has joined #zuul | 20:21 | |
SpamapS | corvus: Was just thinking, start telling people about it now, and then the jump to 4 is the only place where anybody has to sit and think about their reporters. | 20:33 |
SpamapS | but I'm fine with it as-is, just wondering if we can be more aggressive while crossing a major release boundary. | 20:33 |
*** mattw4 has quit IRC | 20:42 | |
*** mattw4 has joined #zuul | 20:42 | |
*** michael-beaver has quit IRC | 21:19 | |
openstackgerrit | Ian Wienand proposed opendev/zone-zuul-ci.org master: git.zuul-ci.org : point to static.opendev.org https://review.opendev.org/710142 | 21:23 |
corvus | SpamapS: i think if we want people to be able to upgrade without significant downtime we have to have a period where explicit reporters are not an error, but zuul uses implicit reporters. so if we move the explicit-reporters-are-error phase back to 4.0, we'll need to make implicit reporters happen before 4.0. even though that might be okay (i suspect few or zero users have more (or less) than one sql | 21:23 |
corvus | connection in a tenant at this point), if they did, that'd be a pretty rough change to do without major signaling ("i know your pipeline says this, but zuul is really going to do this instead") | 21:23 |
*** adam_g has quit IRC | 21:36 | |
*** jamesmcarthur has quit IRC | 21:36 | |
*** jamesmcarthur has joined #zuul | 21:36 | |
*** adam_g has joined #zuul | 21:37 | |
*** jamesmcarthur has quit IRC | 21:44 | |
*** jamesmcarthur has joined #zuul | 21:45 | |
*** adam_g has quit IRC | 21:46 | |
tristanC | considering zuul doesn't have stable branch and is kind of a rolling release, shouldn't we use the v4 increment as a scaled-out scheduler marketting version number | 21:47 |
*** adam_g has joined #zuul | 21:47 | |
tristanC | otherwise, according to semver, asking user to configure zk-auth should be the 4.0.0 | 21:52 |
fungi | i'm wary of any ties between version number choices and "marketing" | 21:54 |
tristanC | thus i'm in favor of adding support for zk auth, implicit sql, and the other thing we'll need for scaled-out scheduler as 3.x version, and then use zuul v4 for the scaled-out feature | 21:54 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Increase timeout in timeout test https://review.opendev.org/710146 | 21:56 |
corvus | zk auth is not a breaking change, implicit sql is (we're literally saying that zuul is going to ignore the configuration you tell it to, and if you dont have a sql connection configured, it won't start) | 21:58 |
*** jamesmcarthur has quit IRC | 21:59 | |
corvus | our interpretation of how to apply the principles of semver to this application (zuul is not a library) are that architectural changes or significant breaking changes for operators get a major version, and breaking changes for users get a minor. | 21:59 |
corvus | it's vague, but it's something. | 22:00 |
tristanC | corvus: i'm not sure what `asking the user to configure zk auth` implies then. it seems like something major where you can't just update zuul | 22:00 |
*** jamesmcarthur has joined #zuul | 22:00 | |
corvus | tristanC: we land zk auth asap, and zuul will work with both but not require auth. users can configure auth anytime between now and 4.0. probably at 4.0 we require auth? | 22:01 |
*** jamesmcarthur has quit IRC | 22:01 | |
corvus | yes, that's #4 | 22:01 |
tristanC | corvus: can't the implicit reporter also land asap too? | 22:02 |
corvus | tristanC: no, an implicit sql reporter is 2 breaking changes: 1) it will ignore what you have in your pipeline; 2) it will break if there's no sql connection | 22:02 |
corvus | it's mostly #1 that i think is intractable (we could obviously make #2 softer) | 22:03 |
tristanC | why not adding sql reporter by default so that user can remove reporter anytime between now and 4.0, and probably at 4.0 we requires no sql connection? | 22:04 |
tristanC | (when sql connection exists) | 22:05 |
*** jamesmcarthur has joined #zuul | 22:05 | |
corvus | sure, we can do that, but doing something unexpected (contrary to what the user configured) is a big deal, and we should signal it. i think we should signal it with a major version bump. | 22:06 |
tristanC | alright, then why not making scaled-out scheduler v5.0 then? | 22:07 |
corvus | i think we have two parallel issues here -- i think everyone wants the ha scheduler work to proceed with minimal disruption. if you ignore the "release" steps in https://etherpad.openstack.org/p/zuulv4 then i think that process achieves it. | 22:07 |
corvus | so then the question is, which of those steps necessitate releases. i think: implied sql reporter (with required sql conn), and required zk auth are breaking changes. that plan proposes the 4.0.0 release be those. i don't think there are any further breaking changes for operators. | 22:09 |
corvus | if we're not happy with those version numbers, i see 2 alternatives: | 22:09 |
corvus | a) decide that major version number bumps should not be used to signal major changes to operators and instead should be used for marketing. in which case, we just relase 4.0.0 at the end of that process. | 22:10 |
tristanC | i meant, between the proposed 4.0 and 4.1, it won't be 'breaking', but correctly operating a ha scheduler probably needs breaking change for operators (e.g. adds an extra scheduler service) | 22:10 |
corvus | b) some compromise where we use 4.0 to signal breaking changes and use 5.0 to signal completion of the feature set and a marketing milestone | 22:10 |
corvus | tristanC: when we have ha schedulers, it's still okay to run one. so at the point the feature is implemented, folks can just optionally run a second one. it's work, but it's not required or disruptive. | 22:11 |
tristanC | i think a or b are better than a 4.0 with only required sql reporter and 4.1 with ha scheduler | 22:11 |
*** rfolco has quit IRC | 22:12 | |
*** tosky has joined #zuul | 22:12 | |
mnaser | i know this might sound silly, but 'marketability' is something that makes sense for big feature changes. | 22:12 |
corvus | the plan in the etherpad doesn't even have 4.1 as ha scheduler, it's just the end of the deprecation period for sql reporters in pipelines. | 22:12 |
mnaser | for example, saying 'zuul' and 'zuul v3' told users a big story in the difference and feature set that comes with it | 22:13 |
tobiash | zuulv5 as ha scheduler :) | 22:13 |
mnaser | being able to easily know if you have a version of an application that has <big feature> is useful overall imho | 22:13 |
corvus | mnaser: yep, i recognize that. i would like to attract new users to whom the feature is important, but also don't want to piss off current users who are like "wtf you broke everything on 3.18.0?" :) | 22:13 |
tristanC | mnaser: i find it easier to communicate | 22:13 |
mnaser | yes, i agree it's easier to communicate, but i also agree on not breaking things in 3.18.0 | 22:14 |
tristanC | corvus: oh my bad, i read the pad as 4.1 is ha scheduler | 22:14 |
corvus | so yeah, if we don't think (just random guess here) 4.3 == ha scheduler is marketable enough, i'm not opposed to zuul v5 for that. | 22:14 |
corvus | i think our users would forgive v5 == no new changes after 4.X other than completion of the ha scheduler work more than breaking things on 3.18.0. | 22:15 |
corvus | (and i think if we want 4.0.0 to be all the breaking changes plus the ha scheduler, we would need a feature branch) | 22:16 |
tristanC | corvus: zuul v5 for ha scheduler sounds more aligned with semver, but zuul v4 would works for me too | 22:16 |
corvus | it sounds like we may be able to obtain consensus on that; let's check with fungi, mordred, and SpamapS (see additional steps 12 and 13 in https://etherpad.openstack.org/p/zuulv4 -- basically, release 5.0.0 as a signal that ha scheduler work is complete) | 22:19 |
mnaser | maybe it's something to think cause its kindof the first time that this happens (yes i acknowledge that it isn't really the first time because we're at zuul v3) but perhaps having some messaging around "zuul v3 is no longer supported/released" | 22:20 |
corvus | tristanC: that does make "error on explicit sql in pipelines" show up in 5.0, which is nice (but not strictly necessary according to our established pracitce) | 22:21 |
fungi | okay, catching up | 22:23 |
*** sshnaidm|afk has quit IRC | 22:23 | |
corvus | mnaser: indeed, both major version changes so far have been major architectural/breaking changes. 1->2 == jenkins http api -> gearman plugin; 2->3 == jenkins -> ansible. | 22:24 |
tristanC | fwiw i've been contemplating an haskell tool that automatically picks a version number based on code changes: https://kowainik.github.io/posts/policeman-bristol#briefing unfortunately that doesn't apply well to unsafe language :) | 22:26 |
*** jamesmcarthur has quit IRC | 22:29 | |
* fungi wonders what qualifies as a "safe" language | 22:29 | |
corvus | one without side effects | 22:30 |
*** jamesmcarthur has joined #zuul | 22:30 | |
*** Goneri has quit IRC | 22:30 | |
fungi | i'm fine with 4.0.0 being the breaking changes and 5.0.0 signalling the completion of those features, though also if the breaking changes can't land at the same time having one be 4.0.0 and the other be 5.0.0 and then completion of both being 6.0.0 also works for me | 22:33 |
fungi | corvus: given the state of modern processors, i wonder if there is any language without side effects (even if unintentional ones) | 22:34 |
*** jamesmcarthur has quit IRC | 22:35 | |
*** jamesmcarthur has joined #zuul | 22:35 | |
tristanC | fungi: i meant a safe a language where a behavior change results in a semantic diff on the exposed interface, like what haskell or rust do | 22:39 |
*** jamesmcarthur has quit IRC | 22:49 | |
*** jamesmcarthur has joined #zuul | 22:54 | |
fungi | makes sense | 22:56 |
*** felixedel has joined #zuul | 23:07 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Store build.error_detail in SQL https://review.opendev.org/709857 | 23:11 |
*** jamesmcarthur has quit IRC | 23:15 | |
*** rlandy is now known as rlandy|bbl | 23:18 | |
SpamapS | corvus: I like that. I'm also in agreement with you either way on how to make the reporter switch (just wondered if we can skip a few steps really) | 23:28 |
mordred | I'm not convinced we need #13 (although I've also had the same thought myself - marketability IS A big deal) ... BUT - I don't think my being convinced or not convinced on 13 matters right now - the nice thing about 13 is that, once HA scheduler is done and we're happy - we could cut a v5 and make noise about it - OR - we could _not_ cut a v5 and instead say "LOOK, we just released HA scheduler as a point | 23:34 |
mordred | release because we are stable enough to do so" | 23:34 |
mordred | like - as much as I want to tout the new feature, I'm also excited that the plan for HA scheduler can be done primarily without breaking changes, and the breaking changes we're doing in prep for it are also not particularly breaking either (they are - by our definition, but compared to _other_ breaking changes out there - I think we're doing pretty good) and it's something worth trumpeting. | 23:35 |
mordred | but it's also just numbers, so I could totally get behind tagging a v5 if it'll help us talk about it | 23:36 |
*** Defolos has quit IRC | 23:48 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!