openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: test_v3: replace while loop with iterate_timeout https://review.opendev.org/662112 | 00:29 |
---|---|---|
*** ianychoi has quit IRC | 00:56 | |
pabelanger | clarkb: when you have time to review: https://review.opendev.org/661866/ | 01:38 |
*** threestrands has joined #zuul | 01:56 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: test_v3: replace while loop with iterate_timeout https://review.opendev.org/662112 | 02:45 |
*** rlandy|bbl has quit IRC | 03:42 | |
*** threestrands has quit IRC | 03:45 | |
*** threestrands has joined #zuul | 04:05 | |
*** threestrands has quit IRC | 04:06 | |
*** raukadah is now known as chandankumar | 04:31 | |
*** saneax has joined #zuul | 04:57 | |
*** pcaruana has joined #zuul | 05:00 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134 | 06:02 |
*** bjackman has joined #zuul | 06:44 | |
*** ianychoi has joined #zuul | 06:57 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: executor: run cleanup playbook on stop https://review.opendev.org/661881 | 07:27 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: docs: add cleanup-run documentation https://review.opendev.org/662147 | 07:27 |
*** jpena|off is now known as jpena | 07:36 | |
*** toabctl has quit IRC | 07:50 | |
openstackgerrit | Andriy Shevchenko proposed x/pbrx master: Update home-page https://review.opendev.org/630132 | 08:44 |
*** bjackman has quit IRC | 08:45 | |
*** bjackman has joined #zuul | 08:46 | |
*** saneax has quit IRC | 08:47 | |
*** panda is now known as panda|ruck | 09:17 | |
openstackgerrit | Slawek Kaplonski proposed zuul/zuul-jobs master: Add role to fetch journal log from test node https://review.opendev.org/643733 | 09:37 |
*** electrofelix has joined #zuul | 09:44 | |
openstackgerrit | Slawek Kaplonski proposed zuul/zuul-jobs master: Add role to fetch journal log from test node https://review.opendev.org/643733 | 10:42 |
*** bjackman_ has joined #zuul | 10:51 | |
*** bjackman has quit IRC | 10:54 | |
*** jpena is now known as jpena|lunch | 11:02 | |
*** tosky has joined #zuul | 12:14 | |
*** jpena|lunch is now known as jpena | 12:25 | |
*** sshnaidm|off has quit IRC | 12:32 | |
*** rlandy has joined #zuul | 12:33 | |
*** sshnaidm has joined #zuul | 12:49 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Build a slack integration https://review.opendev.org/662208 | 13:11 |
*** ofosos has joined #zuul | 13:14 | |
ofosos | The Slack integration is very much WIP, it lacks docs, connection interface and probably a lot more. I'll polish until monday. | 13:15 |
pabelanger | there is some interest from ansible network folks on a slack reporter, so might take a peak | 13:16 |
ofosos | don't, it's crap right now. We have a public holiday and I'm on-call and just hacked something togehter | 13:16 |
ofosos | It currently lacks a reporter, it's a chat bot, that you can tell to run pipelines | 13:17 |
ofosos | But the reporter is next, but not today :) | 13:17 |
ofosos | Probably on the weekend. | 13:17 |
pabelanger | I can't comment on trigger option, I know there has been some discussion in the past for users to be able to more freely do that, but haven't been following the discussion | 13:18 |
pabelanger | but reporter is of some interest | 13:18 |
AJaeger | ofosos: check also https://review.opendev.org/536391 for a previous attempt and see discussion there | 13:20 |
*** pcaruana has quit IRC | 13:23 | |
ofosos | AJaeger: thanks for the pointer | 13:23 |
ofosos | essentially I need both a trigger and a reporter, I want to run infrastructure repos in zuul | 13:24 |
fungi | as part of a team who runs infrastructure repos in zuul, i wonder what the desire for a manually-triggered pipeline is, unless the idea is to be able to drive things like maintenance activities with zuul and not have the build associated with a particular code event? | 13:35 |
AJaeger | even those you can trigger with an "approve" of a change. A manually-triggered post pipeline (queue everything and release at certain time) comes to mind... | 13:36 |
*** bjackman_ has quit IRC | 13:36 | |
fungi | yeah, i mean you could trigger off approval of a code-reviewed maintenance plan in a planning repository or something, i suppose | 13:37 |
AJaeger | exactly | 13:38 |
fungi | but i can see the allure of starting the upgrade-firmware-on-all-the-ethernet-switches job on demand, when there are a sufficient critical mass of other sysadmins on hand to deal with any fallout... it's just zuul's model is around running sets of builds for events related to a git repository (even the periodic pipeline trigger expects a repository associated with any buildset) | 13:39 |
ofosos | How is this different from the TimeTrigger implementation? | 13:43 |
ofosos | I think I basically worked off that piece of code | 13:43 |
fungi | it would still be relative to the state of a particular repository and run a statically-defined set of builds, if it's like the timer trigger | 13:44 |
ofosos | What we want to do is specify the ideal state in a repo, gate it and then roll it out. | 13:44 |
ofosos | But infrastructure has the tendency to degrade and we might need a way to manually trigger a deployment, without a code change | 13:45 |
fungi | ahh, and the "roll it out" part would be triggered by a human instead of the timer | 13:45 |
fungi | or instead of happening immediately after merging | 13:45 |
ofosos | Kind of, the default is to roll it out as part of the gate | 13:45 |
ofosos | But, if stuff breaks, we need to redeploy | 13:45 |
fungi | ahh, so for rerunning | 13:46 |
ofosos | So we'll have a `deployment.yaml' which specifies the state that the system should be in | 13:46 |
fungi | there is the zuul rpc command-line utility, which has an enqueue-ref subcommand | 13:46 |
ofosos | And rerunning will just pick up any drift and smooth it out | 13:46 |
fungi | we frequently rely on that for rerunning things... so i guess this interface would be similar? | 13:47 |
ofosos | Wait... | 13:47 |
*** pcaruana has joined #zuul | 13:47 | |
ofosos | https://imgur.com/a/f6LCgAx | 13:49 |
ofosos | Works like this | 13:49 |
ofosos | In this case the pipeline is just named check, but could be anything | 13:49 |
fungi | so functionally similar to https://zuul-ci.org/docs/zuul/admin/client.html#enqueue-ref | 13:49 |
ofosos | Yes | 13:50 |
pabelanger | ofosos: I've had good success with promote pipeline and periodic pipeline. After gate, promote runs. | 13:50 |
pabelanger | and if that fails for some reason, periodic will then run | 13:50 |
pabelanger | and hopefully fix | 13:50 |
ofosos | We don't do any magic parameters that are outside git, we just need a way to rerun things | 13:50 |
fungi | i'm guessing your "check" pipeline is ref-oriented and not change-oriented, or else running a buildset on a git head wouldn't be doable | 13:50 |
corvus | if the jobs don't make too many assumptions about zuul.* vars, both might be okay | 13:51 |
fungi | mmm, good point | 13:52 |
ofosos | I'm just sitting in a wood workshop and fooling around, we don't have that intricate pipelines yet. I've to see how it works out tomorrow. | 13:53 |
ofosos | Right now it's just in a proof-of-concept state. | 13:53 |
fungi | in our (opendev deployment) case the closest equivalent would probably be either using `zuul enqueue ...` on our promote pipeline or `zuul enqueue-ref ...` on our post pipeline to rerun (run a new buildset for) the jobs which originally ran after a change successfully merged | 13:54 |
corvus | but yeah, part of the refactor into triggers/sources/reporters in v3 was to accomodate this kind of abstraction -- so i think it should work out | 13:54 |
fungi | or, well, not necessarily the jobs which originally ran but the jobs which are configured to run currently, which is usually the same (but sometimes it's not, and sometimes that's also why we want to reenqueue them) | 13:55 |
fungi | well, anyway, my point was that the same could *probably* be accomplished by a chatbot which ran zuul rpc subcommands or implemented the same rpc client interfaces | 13:56 |
fungi | with the right sort of socket configuration and protections you could probably even put it on a separate machine from the scheduler | 13:57 |
fungi | (might need a trivial proxy to go from a tcp socket to a named pipe) | 13:58 |
pabelanger | fungi: I know a while back, maybe gozer folks, talked about zuul client being able to do remote rpc commands (so we didn't need to expose ssh to users). | 14:00 |
clarkb | it uses gearman iirc | 14:05 |
corvus | there's a lot of opportunity for ux improvement with chat triggers/reporters -- if we add an irc bot, we could "recheck 661627" for example. | 14:06 |
corvus | if you're interested in the user-accessed-rpc approach, see the web-admin api spec | 14:06 |
*** chandankumar is now known as raukadah | 14:13 | |
SpamapS | ofosos:FYI, I have had the same desires as you for manual triggering, and I've found empty commits work better. | 14:14 |
SpamapS | ofosos:the only problem is using files matchers and such, which don't trigger, and I've often thought that a header like `Ignore-Matchers: files` in the commit message would be a nice feature anyway. | 14:15 |
ofosos | SpamapS: I'm not really a taken by pushing an empty commit and opening a PR based on that. And I'm not sure what Bitbucket will do, when I try to open a PR with an empty commit. | 14:18 |
SpamapS | ofosos: GitHub and Gerrit handle it fine. | 14:18 |
SpamapS | Worst case, you tack a line into a file, manual_runs.txt. | 14:18 |
SpamapS | Make a script , like `date >> manual_runs.txt && git add manual_runs.txt && git commit -m "manual_run by $USER" && git push origin manual-run-$USER-$(date +%Y%m%d%H%M%S) && bitbucket-client-open-thing` | 14:20 |
SpamapS | Point being, it's actually *immensely* valuable to have *everything* you ever did linked to git. | 14:20 |
ofosos | How do you handle versioning with this? I.e. I want to have well defined versions on master and ideally only roll those out. | 14:21 |
SpamapS | Especially if you have change management controls, the PR/review/whatever-bitbucket-calls-it becomes your paperwork. | 14:21 |
SpamapS | ofosos:tag the new commit? | 14:21 |
SpamapS | I've actually given up on human-defined versions. My devs tag the repo, but everything is tied to the Zuul build UUID, which links to the git commit, so the versions are just a human-readable summary of important commits. | 14:23 |
ofosos | If some piece of infra broke down, why would this justify having a new version of the software? With infrastructure repos this might seem ok, but if I have a joint repo for infra & software it looks foreign | 14:23 |
SpamapS | It's pretty common in practice to have "rebuild" versions of software. | 14:25 |
SpamapS | But, you can always have a repo that is just for triggering. | 14:25 |
corvus | SpamapS: i think at this point we could all learn from a conference presentation on your build/deployment practices. :) | 14:25 |
SpamapS | corvus: :-D | 14:26 |
SpamapS | I should probably submit an abstract for Shanghai eh? ;) | 14:26 |
SpamapS | ofosos: so yeah, one interesting thing you can do with Zuul is attach jobs to repos that aren't the main focus of the job. So you could have a deploy job that requires the 'manual-triggers' project, which is just for recording manual triggers. | 14:29 |
SpamapS | And don't think I haven't tried Slack integration in a similar fashion. :) | 14:29 |
ofosos | So how does the manual-triggers project look? | 14:30 |
ofosos | like | 14:30 |
SpamapS | We carried an experimental patch on our Zuul at GoDaddy for a while. But ultimately, I found that git was still the better way to trigger, and I reverted the slack patch, and wrote a slack-notify role. | 14:30 |
ofosos | Is that slack-notify role somewhere available? | 14:31 |
SpamapS | ofosos: README and maybe a zuul.yaml. | 14:31 |
SpamapS | ofosos:it's been in review forever.. let me dig out the link | 14:31 |
SpamapS | https://review.opendev.org/623594 | 14:31 |
SpamapS | Testing it proved.. complicated. ;) | 14:32 |
SpamapS | Though I think I mostly just needed to change the test slack to something random so we didn't accidentally migrate opendev to slack. ;) | 14:32 |
ofosos | Manual triggers would be interesting, I'd like to loop in a manual trigger, since that might allow me to delegate credentials from the user that wants to run the pipeline to the build system. | 14:32 |
ofosos | That would be a different approach to constraining the build job to purpose built user credentials. | 14:34 |
SpamapS | Indeed! | 14:34 |
SpamapS | ofosos:happy to help you hash it out.. hopefully I've steered you in a happy directoin. Have to run for a while. | 14:35 |
ofosos | Have a good run! I'll take care of my beechwood box and let the ideas percolate. | 14:36 |
*** zbr_ has joined #zuul | 15:06 | |
pabelanger | clarkb: corvus: tobiash: do you mind adding https://review.opendev.org/660856/ to your review pipeline, that is tristanC patch to skip file matcher on timer trigger pipelines. Would like to get your eyes on it please | 15:07 |
*** zbr has quit IRC | 15:09 | |
ofosos | Interestingly this need arose, when I talked to our devs. | 15:12 |
ofosos | SpamapS: how do you deploy an older version? | 15:13 |
clarkb | revert probably | 15:15 |
SpamapS | ofosos: clarkb is correct. The HEAD is what we deploy. Always. | 15:17 |
clarkb | that ensures you have history of the rollback which is nice | 15:18 |
SpamapS | Exactly. | 15:18 |
SpamapS | Rollbacks are changes. | 15:18 |
pabelanger | +1 | 15:18 |
ofosos | Sounds good | 15:18 |
SpamapS | I will say... the git->build->test->upload->deploy->test pipeline is too slow for prod, so there are hot-rollback procedures. | 15:18 |
SpamapS | For instance, if we can get back to a steady state by just rolling back a Kubernetes deployment, we do that. | 15:19 |
SpamapS | But for the most part, if it can wait 15 minutes, it goes through git. | 15:19 |
SpamapS | Looking in to more automatic ways to do that, like Spinnaker's canary deploys. | 15:20 |
SpamapS | Also I realized yesterday our stack is Kubernetes, Ansible, Terraform, Zuul... so.. we herd KATZ | 15:20 |
pabelanger | SpamapS: yah, I've curious how often people use UI for k8s / openshift to scale up / down stuff | 15:20 |
pabelanger | over say, gitops | 15:21 |
SpamapS | pabelanger: scaling should be handled by the pod autoscaler and AWS autoscaling groups. In theory. ;) | 15:21 |
SpamapS | The plumbing on that may have a few "TODO" comments. ;) | 15:21 |
SpamapS | But in general, scaling should always be in response to real data. | 15:22 |
SpamapS | Our git config just sets a baseline, which we try to make "10X more than normal traffic" if we can afford it. | 15:22 |
ofosos | Hmm, we're planning on having an entire blue/green cycle inside the gate pipeline including checking Splunk and Datadog. It feels like this will run for quite some time, especially because we're doing multi-region deployments. Any better ideas? | 15:25 |
SpamapS | ofosos:That probably belongs in a promote style pipeline, not gate. | 15:26 |
SpamapS | promote generally is tied to close+merge events, so that git reflects your intended state at any given time. | 15:26 |
SpamapS | That's how we do it anyway. gate is for validating the proposed git state, and staging artifacts. | 15:27 |
ofosos | SpamapS: can you point me to an example of how this looks in zuul? | 15:27 |
SpamapS | I think it could work to deploy in gate though. Haven't thought about that. | 15:27 |
pabelanger | yah, agree with SpamapS, we've been doing promote too for production things | 15:27 |
pabelanger | would be awkward in gate, incase that change didn't merge properly | 15:28 |
ofosos | What happens if a deployment fails in promote? | 15:29 |
SpamapS | ofosos: http://paste.openstack.org/show/752305/ | 15:29 |
SpamapS | that's our pipeline config | 15:30 |
SpamapS | Note that we don't actually use the `post` pipeline anymore. | 15:30 |
SpamapS | ofosos: promote fails notify slack, and often trigger monitors before that. ;) | 15:31 |
SpamapS | in theory they could comment on the PR too, but we don't do that. | 15:31 |
pabelanger | https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L96 | 15:32 |
ofosos | But then you're in a state where `master' (or whatever) will not result in a viable deployment. | 15:32 |
* SpamapS utterly failed at anonymizing that paste. :-P | 15:32 | |
pabelanger | that is our promote, based on SpamapS one | 15:32 |
pabelanger | we do comment too | 15:33 |
pabelanger | I find that helpful, incase promote doesn't work for some reason | 15:33 |
*** pcaruana has quit IRC | 15:33 | |
SpamapS | ofosos:correct! But that's a 3-alarm fire, and generally we have to decide whether to revert or handle urgently. | 15:33 |
SpamapS | if it happened in gate, we wouldn't actually have the state that resulted in the problem | 15:34 |
SpamapS | Since a gate fail would just reset, and the next thing in the queue would start deploying. | 15:34 |
SpamapS | and TBH, promote jobs don't always detect the failures | 15:34 |
SpamapS | Our promote job fails to wait for all of the things it started to finish, for instance. | 15:35 |
SpamapS | So we have to fall back on monitoring to alert us to that fail. | 15:35 |
SpamapS | This is great btw | 15:35 |
SpamapS | you are all writing my talk for me. | 15:35 |
SpamapS | ;) | 15:35 |
pabelanger | yah, I'd be intereted to shadow your deployment for a day or so, to see how it all works :) | 15:36 |
corvus | SpamapS: "so then ofosos asked '...' and i said '...' and then pabalanger was like '...'!" | 15:37 |
fungi | i want to say corvus did a conference presentation involving opendev's promote pipeline usage model in the past couple of weeks | 15:37 |
fungi | i can't recall where that was merged though | 15:37 |
SpamapS | It's pretty boring. We deploy like, 2 python API's, some 3rd-party stuff, a frontend website, and a bunch of AWS plumbing with terraform. | 15:37 |
corvus | fungi: that was mostly focused on the k8s stuff, only incidental mention of zuul | 15:37 |
fungi | (i think i approved the addition in git though, so shame on my fallable memory) | 15:37 |
SpamapS | corvus: and he was like "shuuttt uuuup" and I was like "whaaatever". ;) | 15:37 |
fungi | corvus: ahh, okay | 15:38 |
corvus | SpamapS: totally krad talk. | 15:38 |
SpamapS | bruh, do you even zuul? | 15:38 |
pabelanger | SpamapS: corvus: I'd totally come to a talk about our irc discussions, and the solutions that came from them :) | 15:39 |
SpamapS | The more interesting work is where people keep reacting violently to Zuul's model and asking "what do we do when the build breaks?" ;) | 15:39 |
clarkb | SpamapS: semi related elsewhere I saw comments about "lets just merge this because we can wait around to fix the zuul gate" | 15:39 |
SpamapS | clarkb:can't? | 15:40 |
clarkb | er ya | 15:40 |
SpamapS | :) | 15:40 |
clarkb | basically they didn't understand that you can't merge unless the gate passes | 15:40 |
clarkb | it is a learning experience for many | 15:40 |
SpamapS | Yeah luckily our gate runs about 15 minutes, and we don't do clean-check, so I haven't had any "skip the gate" conversations as yet. | 15:40 |
pabelanger | clarkb: oh, yah, that happened recently for us too. The hard part right now, is humans still have admin access to repos zuul runs on. It has been difficult asking them to stop doing that workflow | 15:41 |
SpamapS | Oddly enough I also haven't had any "hey it's amazing master always works" compliments yet. Ungrateful devs. | 15:41 |
SpamapS | ofosos: one thing I haven't mentioned yet. We deploy master to our staging environment, but we have a separate branch, called prod, that we use for production. The staging environment is used as a buffer in case there are things people want to visually verify, etc. | 15:42 |
ofosos | SpamapS: interesting detail | 15:43 |
SpamapS | And lately I've had to yell at people to stop doing manual API testing there and write real tests for the gate. "SHIFT LEFT!" I scream into the void. | 15:43 |
ofosos | SpamapS: do you do canary or blue/green on any of the Terraform stuff? | 15:43 |
SpamapS | Nearly every failure we've had in deploy can be traced to things like "Wrong API key in prod config." or "Visual/Legal-review-needed detail missed in staging." | 15:44 |
pabelanger | clarkb: could I get a review on https://review.opendev.org/661866/ wouldn't mind seeing if we could land that | 15:44 |
*** tosky has quit IRC | 15:44 | |
SpamapS | ofosos:no, we just apply and slurp outputs. | 15:45 |
ofosos | We're in the CloudFormation 'rollback failed nightmare'-camp | 15:45 |
clarkb | pabelanger: yes | 15:46 |
SpamapS | I refuse to use CloudFormation. There are real reasons, and also personal reasons, for that. ;) | 15:46 |
clarkb | pabelanger: see my question in #openstack-infra about ansible things if you have a moment too please :) | 15:46 |
ofosos | So no updates, just re-creates. Despite that, we want to roll out VPCs and the like with this, so there are really not going to be updates | 15:46 |
* SpamapS shoots an appreciative but worried glance at the Heat dev team. ;) | 15:46 | |
ofosos | It's not a nice experience, I agree with that. | 15:47 |
SpamapS | ofosos: With terraform we do actually sometimes apply before deploy, and we're talking about storing the plan in git so we don't have surprises. | 15:47 |
SpamapS | But so far none of that has bit us. | 15:47 |
pabelanger | clarkb: replied | 15:48 |
pabelanger | also, relocating network here is terrible | 15:48 |
SpamapS | ofosos: Terraform and CloudFormation are both very very powerful, with emergent behaviors if you don't reign them in. | 15:48 |
SpamapS | IMO nobody should ever use CloudFormation now that Terraform exists. | 15:48 |
fungi | SpamapS: nobody every notices when things are always working. need to occasionally break stuff to get their attention ;) | 15:49 |
SpamapS | We plumb AWS[VPC, EC2, ELB, RDS] -> CloudFlare -> Kubernetes -> StatusCake all with terraform. I can't imagine how much bash/python/garbage code we'd have to write without it. | 15:49 |
clarkb | pabelanger: I had put it on the backburner after the request for a test but now see shrews feels that isn't necessary (I think it wouldn't hurt to have one but also agree seems unnecessary) | 15:50 |
ofosos | I arrived at this company, with the impression that this discussion was already finalized and people still believe that terraform is somehow inferior | 15:50 |
clarkb | SpamapS: do you ipv6 with elb? can you maybe help docker to do the same with docker hub :) | 15:51 |
SpamapS | clarkb: nope, CloudFlare does all our ipv6. | 15:51 |
SpamapS | But AFAIK elbs always get an AAAA and an ipv6. | 15:52 |
fungi | i guess dockerhub needs a cdn | 15:52 |
clarkb | docker hub uses cloudflare to serve the fs layer objects | 15:52 |
SpamapS | so if they're failing to republish the AAAA they're just lazy. | 15:52 |
fungi | huh... i wonder why the dockerhub elb lacks aaaa then | 15:52 |
clarkb | but the index is served behind elb | 15:52 |
fungi | yeah, could be the lazy | 15:52 |
SpamapS | clarkb:weird, I wonder why they wouldn't want to CloudFlare the index. | 15:53 |
clarkb | SpamapS: it leads to fun caching proxy rules | 15:53 |
clarkb | certainly would be easier for us if it was all at a single location | 15:53 |
SpamapS | hm I was wrong, you have to turn on ipv6 on classic ELB | 15:55 |
ofosos | Hmm, I'm wondering on how to do a multi-region promote. Does that make sense to have the promote job run with a delay between regions? | 15:57 |
SpamapS | ofosos:I do a multi-region promote. I think at that point, it's just like any other automated deploy. If you need a delay, do a delay. We do them one after the other. | 15:59 |
ofosos | sounds reasonable | 15:59 |
ofosos | If one of them fails, do you rollback all previous deploys? | 16:00 |
SpamapS | But I actually would really like to hand this all off to Spinnaker and use some of their awesome primitives. | 16:00 |
SpamapS | ofosos:no, we just explode and notify. Some of our stuff self-heals though. The kubernetes deploys for instance do a good job of detecting readiness and not destroying working pods. | 16:01 |
SpamapS | I'm focused more on self-healing than auto-rollback. | 16:02 |
SpamapS | Which is where I want to get Spinnaker canaries involved. | 16:02 |
*** rfolco has quit IRC | 16:06 | |
fungi | our situation is probably a lot different, but we're performing more and more full-stack tests by using our deployment ansible playbooks to deploy test copies of sections of the infrastructure in virtual machines and exercise it | 16:09 |
fungi | before we approve, heck before we even review those modifications | 16:09 |
*** rfolco has joined #zuul | 16:11 | |
*** electrofelix has quit IRC | 16:16 | |
openstackgerrit | Merged zuul/nodepool master: Add error handling when cleaning up resources https://review.opendev.org/661866 | 16:18 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 16:20 |
SpamapS | fungi:that's exactly what we do too | 16:20 |
fungi | cool! | 16:21 |
fungi | so it's not just us then | 16:21 |
SpamapS | check does what it can w/o secrets. gate does more with secrets. It's a pretty tight funnel, and most of what makes it through does what the programmer/automator intended. | 16:21 |
SpamapS | For instance, in check, we deploy all of our kubernetes things into a minikube. | 16:22 |
SpamapS | In gate, we can stick them into a real k8s cluster because now we have creds in secrets. | 16:22 |
fungi | we use fake credentials in check/gate and just stand up copies of the additional services we want the service we're testing to interact with | 16:22 |
*** electrofelix has joined #zuul | 16:22 | |
SpamapS | fungi:one big difference for us, is that we have ~20 3rd party API's to deal with, so we can't stand up fakes. | 16:22 |
SpamapS | But in gate, we have fake-ish account creds to run tests with. | 16:23 |
ofosos | Hmm, hmm, hmm. The glue of my box is setting... | 16:23 |
fungi | but yeah, that doesn't catch possible differences on long-running persistent systems, and we also eschew proprietary software/services | 16:23 |
fungi | so our free software ideals help us out there | 16:23 |
SpamapS | I swim in a sea of OPP (other peoples programs). ;) | 16:23 |
fungi | we do rely on lots of opp, it's just opflossp | 16:24 |
ofosos | I still like to have some knobs: I'd like to model my playbooks in a way that I can trigger a hot rollback in some easy/general way and I'd like a knob for passing credentials to the promote job. I think I need to build another box to mull this through :) | 16:25 |
SpamapS | ofosos: one way to model this in zuul is to make a fast-track pipeline. I did that at GoDaddy, where certain things would trigger things to run w/o long tests. | 16:35 |
SpamapS | Like, a label on a PR, or a specific hot-fix branch. | 16:35 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 16:39 |
ofosos | SpamapS: what do you think about 'a thing' that uses AWS cognito to log you into a site and then passes your role/permissions/credentials to a job to execute with? I think that would be fairly easy to realize. | 16:42 |
SpamapS | ofosos: I think that might be fine. I choose to run everything through git though. ;) | 16:46 |
SpamapS | I am immune to eye rolls though. | 16:46 |
ofosos | Yeah, but then you end up with a) a lot of roles that can do a lot of stuff, in sum allowing zuul to do everything; or b) outright allowing zuul to do everything. Both are kind of bad, IMO. | 16:50 |
ofosos | This could be part of a promote pipeline, i.e. the pipeline requests credentials from a credential provider. | 16:51 |
SpamapS | ofosos:the roles aren't what enables things, the secrets are. And those should be tightly coupled to whatever lets people approve/merge changes. | 16:51 |
fungi | zuul's secrets model has had tons of thought put into its design specifically to allow these use cases, so that you don't need to have a separate secrets store for your jobs to authenticate to and fetch from | 16:52 |
ofosos | Having fixed credentials is kind of bad. I'd prefer to operate with temporary credentials. Irrespective of how much brainpower people put into managing these fixed credentials. | 16:55 |
fungi | ahh, so you have some separate system create new credentials on the fly and authorize them in the relevant services and hand those to jobs and then revoke them when the build completes? | 16:56 |
fungi | i guess it just depends on where you put that trusted central authority. in our case zuul is our trusted central authority for such purposes | 16:57 |
ofosos | Yep, that's right. But mostly they'll just time out. | 16:57 |
fungi | also the job needs some way to authenticate to the credential broker, so the fixed credentials it uses to authenticate to the credential broker becomes the new authority, in effect | 16:59 |
fungi | i suppose it does though give you the ability to insta-revoke access from zuul jobs to all systems by just revoking the credentials it uses to interface with the credential broker, rather than needing to individually revoke various credentials which were in the job secrets | 17:03 |
ofosos | The workflow would be: Job requests credentials, user logs into web ui and grants access with their credentials, job is passed a set of credentials which don't renew. Ideally the job could sign that request, so the user is presented with information about which system is requesting their permissions. | 17:03 |
ofosos | This is for prod, for test, we'll likely use a more relaxed policy. | 17:04 |
fungi | do the user's credentials also time out? otherwise what's to prevent the system from caching/saving and reusing them? | 17:08 |
fungi | i suppose otp could work there | 17:08 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 17:08 |
ofosos | We'll just generate temporary credentials based on the users permission level. | 17:11 |
fungi | and then the user enters those? | 17:15 |
fungi | or the user is entering durable credentials which the job then uses to obtain temporary credentials? | 17:15 |
ofosos | Nope, the user is authenticated, we check his permissions and generate appropriate temporary credentials based on his permission level to pass on to the job. | 17:15 |
ofosos | The user has to authenticate in some way. I think with our setup, this will likely be oauth. | 17:16 |
ofosos | The `thing' (service) just makes sure that the job never has durable credentials. | 17:16 |
ofosos | I'll try to build something on the weekend, so I can demo it. Maybe that'll be easier than just text. :) | 17:18 |
ofosos | I'll be afk for the rest of the day, need to enjoy the public holiday some more. Very enjoyable discussion :) | 17:19 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate builds with event id https://review.opendev.org/658895 | 17:28 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Log github requests with annotated events https://review.opendev.org/660800 | 17:28 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate logs around build completion and cancellation https://review.opendev.org/660806 | 17:28 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate logs around build states https://review.opendev.org/661489 | 17:28 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate logs around reporting https://review.opendev.org/661490 | 17:28 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate logs around finished builds https://review.opendev.org/661491 | 17:28 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 17:34 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper https://review.opendev.org/661114 | 17:34 |
Shrews | i'm sad zuul tests seem to lack an equivalent of nodepool's CLI tests | 17:39 |
SpamapS | ofosos: I don't know why you'd want a user to be the gateway for credentials. We make policy the gateway. If a job has been granted permissions, it can go forward. We scope down when API's allow it, like Amazon's sts, where we make a token that only is valid for the life of the job, but a human doesn't do that, a trusted job does it. | 17:40 |
SpamapS | Also we only allow a narrow team of individuals to commit things to the prod branch, so if it's in prod, it has already had a human authorization. | 17:41 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 17:44 |
corvus | Shrews: the rpcs are tested, and so far the client has been a thin enough layer on the rpcs that if they work, the cli should too. that's in test_scheduler.py (of course, because they're old and everything is there) -- eg test_autohold | 17:47 |
corvus | Shrews: there is a test_client.py which seems to have one cli executable test | 17:47 |
*** electrofelix has quit IRC | 17:48 | |
corvus | could probably combine the two to make a new test if you didn't feel the rpc-only test was sufficient | 17:48 |
Shrews | corvus: yep, i'm aware of that one. it lacks the framework for testing output though, similar to https://opendev.org/zuul/nodepool/src/branch/master/nodepool/tests/unit/test_commands.py | 17:49 |
*** electrofelix has joined #zuul | 17:50 | |
tobiash | tristanC: I added some thoughts to https://review.opendev.org/590092 | 17:59 |
tobiash | corvus: I'd be curious what you think ^ | 18:01 |
*** electrofelix has quit IRC | 18:03 | |
corvus | tobiash: i think i agree about forwarding zuul_return. i'm not sure about the rest right now. | 18:06 |
tobiash | corvus: thanks | 18:10 |
*** jpena is now known as jpena|off | 18:13 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 18:17 |
*** pcaruana has joined #zuul | 18:24 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 18:38 |
*** nickx-intel has joined #zuul | 18:54 | |
nickx-intel | how do I inherit variables from main.yaml > main-task role > leaf-task role ? | 18:55 |
nickx-intel | weird | 19:02 |
nickx-intel | it's erroring because it's finding variablename in - name: but variablename isn't noted by {{}} | 19:03 |
nickx-intel | can't I escape variablename so that it doesn't try to parse - name: "stuff" | 19:03 |
pabelanger | nickx-intel: where is your variable? | 19:03 |
nickx-intel | pabelanger, I have it declared in run.yaml variables | 19:04 |
pabelanger | nickx-intel: you can look to inventory file for it | 19:05 |
pabelanger | it depends how you are setting it | 19:05 |
nickx-intel | hmm | 19:05 |
pabelanger | if you are using set_facts, they don't persist across ansible-playbook runs | 19:05 |
nickx-intel | does run.yaml vars: use set_facts? | 19:06 |
nickx-intel | implicitly? | 19:06 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper https://review.opendev.org/661114 | 19:06 |
pabelanger | nickx-intel: what is run.yaml? | 19:06 |
pabelanger | is that a pre-run / run / post-run playbook? | 19:07 |
pabelanger | in your zuul job | 19:07 |
nickx-intel | it's a run playbook | 19:07 |
pabelanger | nickx-intel: how are you setting the fact? It would only be set_fact, if you called that task | 19:08 |
pabelanger | other wise, if a zuul job variable, that will be stored in the inventory file | 19:08 |
nickx-intel | pabelanger, setting implicitly? idk? it's not an explicitly defined variable assignment. like. it just does like this, | 19:10 |
nickx-intel | vars: | 19:10 |
nickx-intel | key: value | 19:10 |
nickx-intel | key2: value2 | 19:10 |
pabelanger | yah, if that is in your play, that should work | 19:10 |
nickx-intel | does this implicitly call set_fact? | 19:10 |
pabelanger | no | 19:10 |
pabelanger | nickx-intel: https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html explains all the fun with variables in ansible | 19:11 |
nickx-intel | do I need to call branch_role(vars) leaf_role(vars) or something? | 19:11 |
pabelanger | nope | 19:11 |
pabelanger | you should be able to call | 19:11 |
pabelanger | task: shell: "echo {{ key }}" | 19:12 |
pabelanger | and it works | 19:12 |
pabelanger | in the same play | 19:12 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 19:13 |
nickx-intel | I'm trying to implement include_role: leaf_role | 19:13 |
nickx-intel | but my leaf_role is dumb | 19:13 |
pabelanger | you might need to pass vars into include_role, see: https://docs.ansible.com/ansible/latest/modules/include_role_module.html | 19:14 |
pabelanger | you likely hitting a scoping issue | 19:14 |
pabelanger | Pass variables to role example in link above | 19:14 |
nickx-intel | yeah that's my apparent position pabelanger, vis branch_role(vars) leaf_role(vars) :) | 19:14 |
nickx-intel | I'll dig more after lunch, I think this is sufficient, thank you pabelanger for confirming my suspicion | 19:15 |
nickx-intel | I'll post my fix after I fix lol :) | 19:16 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 19:24 |
*** tosky has joined #zuul | 19:38 | |
*** rlandy is now known as rlandy|brb | 19:39 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 19:52 |
*** rlandy|brb is now known as rlandy | 20:08 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 20:12 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316 | 20:16 |
clarkb | tristanC: one thing I notice about my change at ^ is that it added itnegrity shas to the packages in my yarn.lock update but we don't seem to have those on the other locked packages | 20:18 |
clarkb | tristanC: does that mean I did something wrong or will it add those optomistically? | 20:19 |
clarkb | reading on the internet seems like older yarn didn't add those and newer yarn does. Maybe the version of yarn used to generate the existing lock file was older? | 20:27 |
clarkb | seems like checking package hashes is a good thing so I don't think I'll try to undo it unless someone says we need to | 20:27 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 20:32 |
*** pcaruana has quit IRC | 21:18 | |
pabelanger | tobiash: have you seen this error before with github? http://paste.openstack.org/show/752331/ | 21:27 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316 | 21:27 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Use nodejs v10 in testing https://review.opendev.org/662339 | 21:27 |
clarkb | the axios change failed an in debugging it noticed we use nodejs 6, 8 and 10 in different jobs | 21:30 |
clarkb | I've tried to make it nodejs 10 across the board in hopes that also fixes my axios problem | 21:30 |
clarkb | but I think we should use nodejs10 regardless | 21:30 |
fungi | likely so | 21:31 |
clarkb | ok new nodejs doesn't fix the axios change issue | 21:38 |
* clarkb generates new yarn.lock from scratch | 21:38 | |
clarkb | after rebuilding venv so that it has nodejs 10 in it | 21:38 |
fungi | web browsers are so last decade anyway | 21:40 |
pabelanger | tobiash: it looks like some PR reviews, don't have commit_id: https://api.github.com/repos/ansible/ansible/pulls/45469/reviews | 21:43 |
pabelanger | but I don't know why | 21:44 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316 | 21:45 |
pabelanger | https://github.com/ansible/ansible/pull/45469/files/1feaf0f2df238cf6788c65c80f08e655891091f6 | 21:45 |
pabelanger | looks to be deleted? | 21:45 |
pabelanger | jlk: when you have spare cycles, I'd be interested in what you think we need to do about pull reviews missing a commit_id, see pb above | 21:46 |
clarkb | pabelanger: maybe that happens if you do a rebase and replace the old commits? | 21:46 |
clarkb | github has in the past not been great about keeping that data around | 21:46 |
pabelanger | clarkb: maybe | 21:47 |
clarkb | it does keep diff contexts now but last I checked the commits are gone | 21:47 |
clarkb | and it is the first 2 comments that don't have it in this case whihc would fit under that I think | 21:47 |
pabelanger | but looks like we need to update github3.py, because https://github.com/sigmavirus24/github3.py/blob/master/src/github3/pulls.py#L961 is where it is failing | 21:47 |
pabelanger | not sure what we should do in that case | 21:48 |
clarkb | pabelanger: ya seems like it | 21:48 |
pabelanger | I'm not even sure what we are using pullreviews for right now | 21:50 |
pabelanger | I guess for pipeline trigger | 21:51 |
pabelanger | so, in our case, we likely don't care about commit_id | 21:51 |
clarkb | maybe not? if you trigger approvals or rechecks and expect the commit id to identify what to test we might, but I think we always use the current state of the PR HEAD so it is similar to gerrit in that way | 21:53 |
clarkb | ok someone smarter than me will have to figure out the axios bump. The other change https://review.opendev.org/662339 is good to go I expect | 22:01 |
jlk | pabelanger: My thought is that if it's missing a commit_id it gets discarded. But _also_ this looks like a bug in github3.py; always assuming there is a commit_id. That's probably my code. | 22:06 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327 | 22:21 |
pabelanger | clarkb: jlk: my 2min fix: https://github.com/sigmavirus24/github3.py/pull/944 | 22:27 |
pabelanger | will look at tests in a bit to see if we need to add coverage | 22:27 |
jlk | alrighty. I think there might be, but again I wrote most of that so it's possible I didn't do it right. | 22:27 |
jlk | I'm asking internally about this. It should be documented. | 22:27 |
pabelanger | cool, thanks | 22:27 |
clarkb | pabelanger: we may need to update zuul to check for a None commit_id after taht goes in but that is probably the right approach | 22:28 |
jlk | Yes, if our assumption is correct (a review for a commit that was force pushed out of the branch) then the proper thing to do is toss the review. | 22:29 |
pabelanger | clarkb: yah, seems like a good idea | 22:29 |
pabelanger | doing that patch now | 22:30 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete https://review.opendev.org/662347 | 22:35 |
pabelanger | clarkb: jlk: believe that is what you are suggesting^ | 22:35 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete https://review.opendev.org/662347 | 22:37 |
pabelanger | heh | 22:38 |
pabelanger | zuul can't seem to merge depends-on on ^ | 22:38 |
pabelanger | let me remove to confirm | 22:38 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete https://review.opendev.org/662347 | 22:38 |
pabelanger | HA | 22:45 |
pabelanger | clarkb: zuul.o.o is getting the same commit_id exception on that review | 22:45 |
jlk | odd | 22:46 |
pabelanger | that is why there is a merge failure it seems | 22:46 |
jlk | oh, because it was trying to see if that referenced PR is mergeable yet, by looking at reviews? | 22:46 |
pabelanger | that is what I'm looking at now | 22:46 |
pabelanger | oh, maybe it was another event from github | 22:48 |
pabelanger | I'll have to dig later on zuul.o.o | 22:48 |
tristanC | clarkb: that should be fine, perhaps you need to also bump (or removed) the yarn version from the lock/packages.json | 22:56 |
clarkb | tristanC: it did bump it but is still failing | 23:00 |
tristanC | clarkb: i meant in the packages.json, though i don't know if yarn should install itself or if it's safe to use a global one | 23:09 |
clarkb | ah | 23:11 |
tristanC | clarkb: and it seems like the yarn.lock change bump versions for un-pinned dependencies like eslint-plugin-react (from 7.11 to 7.13) | 23:11 |
tristanC | which may not be compatible with the pinned one like react-scripts 1.14 | 23:11 |
tristanC | clarkb: perhaps we should try to rebase on https://review.opendev.org/659991 | 23:11 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316 | 23:13 |
clarkb | is that what you mean about the yarn versions? | 23:13 |
clarkb | and ya wouldn't surprise me if we need to update other things and so basing it on that revert might be the way to go | 23:13 |
clarkb | or update the revert to update axios | 23:13 |
*** ianychoi has quit IRC | 23:13 | |
*** rlandy has quit IRC | 23:16 | |
*** panda|ruck has quit IRC | 23:22 | |
*** panda has joined #zuul | 23:23 | |
*** tosky has quit IRC | 23:33 | |
*** tjgresha has joined #zuul | 23:50 | |
*** tjgresha has quit IRC | 23:55 | |
*** tjgresha has joined #zuul | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!