Thursday, 2019-05-30

openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: test_v3: replace while loop with iterate_timeout https://review.opendev.org/662112	00:29
*** ianychoi has quit IRC		00:56
pabelanger	clarkb: when you have time to review: https://review.opendev.org/661866/	01:38
*** threestrands has joined #zuul		01:56
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: test_v3: replace while loop with iterate_timeout https://review.opendev.org/662112	02:45
*** rlandy\|bbl has quit IRC		03:42
*** threestrands has quit IRC		03:45
*** threestrands has joined #zuul		04:05
*** threestrands has quit IRC		04:06
*** raukadah is now known as chandankumar		04:31
*** saneax has joined #zuul		04:57
*** pcaruana has joined #zuul		05:00
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	06:02
*** bjackman has joined #zuul		06:44
*** ianychoi has joined #zuul		06:57
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: executor: run cleanup playbook on stop https://review.opendev.org/661881	07:27
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: docs: add cleanup-run documentation https://review.opendev.org/662147	07:27
*** jpena\|off is now known as jpena		07:36
*** toabctl has quit IRC		07:50
openstackgerrit	Andriy Shevchenko proposed x/pbrx master: Update home-page https://review.opendev.org/630132	08:44
*** bjackman has quit IRC		08:45
*** bjackman has joined #zuul		08:46
*** saneax has quit IRC		08:47
*** panda is now known as panda\|ruck		09:17
openstackgerrit	Slawek Kaplonski proposed zuul/zuul-jobs master: Add role to fetch journal log from test node https://review.opendev.org/643733	09:37
*** electrofelix has joined #zuul		09:44
openstackgerrit	Slawek Kaplonski proposed zuul/zuul-jobs master: Add role to fetch journal log from test node https://review.opendev.org/643733	10:42
*** bjackman_ has joined #zuul		10:51
*** bjackman has quit IRC		10:54
*** jpena is now known as jpena\|lunch		11:02
*** tosky has joined #zuul		12:14
*** jpena\|lunch is now known as jpena		12:25
*** sshnaidm\|off has quit IRC		12:32
*** rlandy has joined #zuul		12:33
*** sshnaidm has joined #zuul		12:49
openstackgerrit	Mark Meyer proposed zuul/zuul master: Build a slack integration https://review.opendev.org/662208	13:11
*** ofosos has joined #zuul		13:14
ofosos	The Slack integration is very much WIP, it lacks docs, connection interface and probably a lot more. I'll polish until monday.	13:15
pabelanger	there is some interest from ansible network folks on a slack reporter, so might take a peak	13:16
ofosos	don't, it's crap right now. We have a public holiday and I'm on-call and just hacked something togehter	13:16
ofosos	It currently lacks a reporter, it's a chat bot, that you can tell to run pipelines	13:17
ofosos	But the reporter is next, but not today :)	13:17
ofosos	Probably on the weekend.	13:17
pabelanger	I can't comment on trigger option, I know there has been some discussion in the past for users to be able to more freely do that, but haven't been following the discussion	13:18
pabelanger	but reporter is of some interest	13:18
AJaeger	ofosos: check also https://review.opendev.org/536391 for a previous attempt and see discussion there	13:20
*** pcaruana has quit IRC		13:23
ofosos	AJaeger: thanks for the pointer	13:23
ofosos	essentially I need both a trigger and a reporter, I want to run infrastructure repos in zuul	13:24
fungi	as part of a team who runs infrastructure repos in zuul, i wonder what the desire for a manually-triggered pipeline is, unless the idea is to be able to drive things like maintenance activities with zuul and not have the build associated with a particular code event?	13:35
AJaeger	even those you can trigger with an "approve" of a change. A manually-triggered post pipeline (queue everything and release at certain time) comes to mind...	13:36
*** bjackman_ has quit IRC		13:36
fungi	yeah, i mean you could trigger off approval of a code-reviewed maintenance plan in a planning repository or something, i suppose	13:37
AJaeger	exactly	13:38
fungi	but i can see the allure of starting the upgrade-firmware-on-all-the-ethernet-switches job on demand, when there are a sufficient critical mass of other sysadmins on hand to deal with any fallout... it's just zuul's model is around running sets of builds for events related to a git repository (even the periodic pipeline trigger expects a repository associated with any buildset)	13:39
ofosos	How is this different from the TimeTrigger implementation?	13:43
ofosos	I think I basically worked off that piece of code	13:43
fungi	it would still be relative to the state of a particular repository and run a statically-defined set of builds, if it's like the timer trigger	13:44
ofosos	What we want to do is specify the ideal state in a repo, gate it and then roll it out.	13:44
ofosos	But infrastructure has the tendency to degrade and we might need a way to manually trigger a deployment, without a code change	13:45
fungi	ahh, and the "roll it out" part would be triggered by a human instead of the timer	13:45
fungi	or instead of happening immediately after merging	13:45
ofosos	Kind of, the default is to roll it out as part of the gate	13:45
ofosos	But, if stuff breaks, we need to redeploy	13:45
fungi	ahh, so for rerunning	13:46
ofosos	So we'll have a `deployment.yaml' which specifies the state that the system should be in	13:46
fungi	there is the zuul rpc command-line utility, which has an enqueue-ref subcommand	13:46
ofosos	And rerunning will just pick up any drift and smooth it out	13:46
fungi	we frequently rely on that for rerunning things... so i guess this interface would be similar?	13:47
ofosos	Wait...	13:47
*** pcaruana has joined #zuul		13:47
ofosos	https://imgur.com/a/f6LCgAx	13:49
ofosos	Works like this	13:49
ofosos	In this case the pipeline is just named check, but could be anything	13:49
fungi	so functionally similar to https://zuul-ci.org/docs/zuul/admin/client.html#enqueue-ref	13:49
ofosos	Yes	13:50
pabelanger	ofosos: I've had good success with promote pipeline and periodic pipeline. After gate, promote runs.	13:50
pabelanger	and if that fails for some reason, periodic will then run	13:50
pabelanger	and hopefully fix	13:50
ofosos	We don't do any magic parameters that are outside git, we just need a way to rerun things	13:50
fungi	i'm guessing your "check" pipeline is ref-oriented and not change-oriented, or else running a buildset on a git head wouldn't be doable	13:50
corvus	if the jobs don't make too many assumptions about zuul.* vars, both might be okay	13:51
fungi	mmm, good point	13:52
ofosos	I'm just sitting in a wood workshop and fooling around, we don't have that intricate pipelines yet. I've to see how it works out tomorrow.	13:53
ofosos	Right now it's just in a proof-of-concept state.	13:53
fungi	in our (opendev deployment) case the closest equivalent would probably be either using `zuul enqueue ...` on our promote pipeline or `zuul enqueue-ref ...` on our post pipeline to rerun (run a new buildset for) the jobs which originally ran after a change successfully merged	13:54
corvus	but yeah, part of the refactor into triggers/sources/reporters in v3 was to accomodate this kind of abstraction -- so i think it should work out	13:54
fungi	or, well, not necessarily the jobs which originally ran but the jobs which are configured to run currently, which is usually the same (but sometimes it's not, and sometimes that's also why we want to reenqueue them)	13:55
fungi	well, anyway, my point was that the same could probably be accomplished by a chatbot which ran zuul rpc subcommands or implemented the same rpc client interfaces	13:56
fungi	with the right sort of socket configuration and protections you could probably even put it on a separate machine from the scheduler	13:57
fungi	(might need a trivial proxy to go from a tcp socket to a named pipe)	13:58
pabelanger	fungi: I know a while back, maybe gozer folks, talked about zuul client being able to do remote rpc commands (so we didn't need to expose ssh to users).	14:00
clarkb	it uses gearman iirc	14:05
corvus	there's a lot of opportunity for ux improvement with chat triggers/reporters -- if we add an irc bot, we could "recheck 661627" for example.	14:06
corvus	if you're interested in the user-accessed-rpc approach, see the web-admin api spec	14:06
*** chandankumar is now known as raukadah		14:13
SpamapS	ofosos:FYI, I have had the same desires as you for manual triggering, and I've found empty commits work better.	14:14
SpamapS	ofosos:the only problem is using files matchers and such, which don't trigger, and I've often thought that a header like `Ignore-Matchers: files` in the commit message would be a nice feature anyway.	14:15
ofosos	SpamapS: I'm not really a taken by pushing an empty commit and opening a PR based on that. And I'm not sure what Bitbucket will do, when I try to open a PR with an empty commit.	14:18
SpamapS	ofosos: GitHub and Gerrit handle it fine.	14:18
SpamapS	Worst case, you tack a line into a file, manual_runs.txt.	14:18
SpamapS	Make a script , like `date >> manual_runs.txt && git add manual_runs.txt && git commit -m "manual_run by $USER" && git push origin manual-run-$USER-$(date +%Y%m%d%H%M%S) && bitbucket-client-open-thing`	14:20
SpamapS	Point being, it's actually immensely valuable to have everything you ever did linked to git.	14:20
ofosos	How do you handle versioning with this? I.e. I want to have well defined versions on master and ideally only roll those out.	14:21
SpamapS	Especially if you have change management controls, the PR/review/whatever-bitbucket-calls-it becomes your paperwork.	14:21
SpamapS	ofosos:tag the new commit?	14:21
SpamapS	I've actually given up on human-defined versions. My devs tag the repo, but everything is tied to the Zuul build UUID, which links to the git commit, so the versions are just a human-readable summary of important commits.	14:23
ofosos	If some piece of infra broke down, why would this justify having a new version of the software? With infrastructure repos this might seem ok, but if I have a joint repo for infra & software it looks foreign	14:23
SpamapS	It's pretty common in practice to have "rebuild" versions of software.	14:25
SpamapS	But, you can always have a repo that is just for triggering.	14:25
corvus	SpamapS: i think at this point we could all learn from a conference presentation on your build/deployment practices. :)	14:25
SpamapS	corvus: :-D	14:26
SpamapS	I should probably submit an abstract for Shanghai eh? ;)	14:26
SpamapS	ofosos: so yeah, one interesting thing you can do with Zuul is attach jobs to repos that aren't the main focus of the job. So you could have a deploy job that requires the 'manual-triggers' project, which is just for recording manual triggers.	14:29
SpamapS	And don't think I haven't tried Slack integration in a similar fashion. :)	14:29
ofosos	So how does the manual-triggers project look?	14:30
ofosos	like	14:30
SpamapS	We carried an experimental patch on our Zuul at GoDaddy for a while. But ultimately, I found that git was still the better way to trigger, and I reverted the slack patch, and wrote a slack-notify role.	14:30
ofosos	Is that slack-notify role somewhere available?	14:31
SpamapS	ofosos: README and maybe a zuul.yaml.	14:31
SpamapS	ofosos:it's been in review forever.. let me dig out the link	14:31
SpamapS	https://review.opendev.org/623594	14:31
SpamapS	Testing it proved.. complicated. ;)	14:32
SpamapS	Though I think I mostly just needed to change the test slack to something random so we didn't accidentally migrate opendev to slack. ;)	14:32
ofosos	Manual triggers would be interesting, I'd like to loop in a manual trigger, since that might allow me to delegate credentials from the user that wants to run the pipeline to the build system.	14:32
ofosos	That would be a different approach to constraining the build job to purpose built user credentials.	14:34
SpamapS	Indeed!	14:34
SpamapS	ofosos:happy to help you hash it out.. hopefully I've steered you in a happy directoin. Have to run for a while.	14:35
ofosos	Have a good run! I'll take care of my beechwood box and let the ideas percolate.	14:36
*** zbr_ has joined #zuul		15:06
pabelanger	clarkb: corvus: tobiash: do you mind adding https://review.opendev.org/660856/ to your review pipeline, that is tristanC patch to skip file matcher on timer trigger pipelines. Would like to get your eyes on it please	15:07
*** zbr has quit IRC		15:09
ofosos	Interestingly this need arose, when I talked to our devs.	15:12
ofosos	SpamapS: how do you deploy an older version?	15:13
clarkb	revert probably	15:15
SpamapS	ofosos: clarkb is correct. The HEAD is what we deploy. Always.	15:17
clarkb	that ensures you have history of the rollback which is nice	15:18
SpamapS	Exactly.	15:18
SpamapS	Rollbacks are changes.	15:18
pabelanger	+1	15:18
ofosos	Sounds good	15:18
SpamapS	I will say... the git->build->test->upload->deploy->test pipeline is too slow for prod, so there are hot-rollback procedures.	15:18
SpamapS	For instance, if we can get back to a steady state by just rolling back a Kubernetes deployment, we do that.	15:19
SpamapS	But for the most part, if it can wait 15 minutes, it goes through git.	15:19
SpamapS	Looking in to more automatic ways to do that, like Spinnaker's canary deploys.	15:20
SpamapS	Also I realized yesterday our stack is Kubernetes, Ansible, Terraform, Zuul... so.. we herd KATZ	15:20
pabelanger	SpamapS: yah, I've curious how often people use UI for k8s / openshift to scale up / down stuff	15:20
pabelanger	over say, gitops	15:21
SpamapS	pabelanger: scaling should be handled by the pod autoscaler and AWS autoscaling groups. In theory. ;)	15:21
SpamapS	The plumbing on that may have a few "TODO" comments. ;)	15:21
SpamapS	But in general, scaling should always be in response to real data.	15:22
SpamapS	Our git config just sets a baseline, which we try to make "10X more than normal traffic" if we can afford it.	15:22
ofosos	Hmm, we're planning on having an entire blue/green cycle inside the gate pipeline including checking Splunk and Datadog. It feels like this will run for quite some time, especially because we're doing multi-region deployments. Any better ideas?	15:25
SpamapS	ofosos:That probably belongs in a promote style pipeline, not gate.	15:26
SpamapS	promote generally is tied to close+merge events, so that git reflects your intended state at any given time.	15:26
SpamapS	That's how we do it anyway. gate is for validating the proposed git state, and staging artifacts.	15:27
ofosos	SpamapS: can you point me to an example of how this looks in zuul?	15:27
SpamapS	I think it could work to deploy in gate though. Haven't thought about that.	15:27
pabelanger	yah, agree with SpamapS, we've been doing promote too for production things	15:27
pabelanger	would be awkward in gate, incase that change didn't merge properly	15:28
ofosos	What happens if a deployment fails in promote?	15:29
SpamapS	ofosos: http://paste.openstack.org/show/752305/	15:29
SpamapS	that's our pipeline config	15:30
SpamapS	Note that we don't actually use the `post` pipeline anymore.	15:30
SpamapS	ofosos: promote fails notify slack, and often trigger monitors before that. ;)	15:31
SpamapS	in theory they could comment on the PR too, but we don't do that.	15:31
pabelanger	https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L96	15:32
ofosos	But then you're in a state where `master' (or whatever) will not result in a viable deployment.	15:32
* SpamapS utterly failed at anonymizing that paste. :-P		15:32
pabelanger	that is our promote, based on SpamapS one	15:32
pabelanger	we do comment too	15:33
pabelanger	I find that helpful, incase promote doesn't work for some reason	15:33
*** pcaruana has quit IRC		15:33
SpamapS	ofosos:correct! But that's a 3-alarm fire, and generally we have to decide whether to revert or handle urgently.	15:33
SpamapS	if it happened in gate, we wouldn't actually have the state that resulted in the problem	15:34
SpamapS	Since a gate fail would just reset, and the next thing in the queue would start deploying.	15:34
SpamapS	and TBH, promote jobs don't always detect the failures	15:34
SpamapS	Our promote job fails to wait for all of the things it started to finish, for instance.	15:35
SpamapS	So we have to fall back on monitoring to alert us to that fail.	15:35
SpamapS	This is great btw	15:35
SpamapS	you are all writing my talk for me.	15:35
SpamapS	;)	15:35
pabelanger	yah, I'd be intereted to shadow your deployment for a day or so, to see how it all works :)	15:36
corvus	SpamapS: "so then ofosos asked '...' and i said '...' and then pabalanger was like '...'!"	15:37
fungi	i want to say corvus did a conference presentation involving opendev's promote pipeline usage model in the past couple of weeks	15:37
fungi	i can't recall where that was merged though	15:37
SpamapS	It's pretty boring. We deploy like, 2 python API's, some 3rd-party stuff, a frontend website, and a bunch of AWS plumbing with terraform.	15:37
corvus	fungi: that was mostly focused on the k8s stuff, only incidental mention of zuul	15:37
fungi	(i think i approved the addition in git though, so shame on my fallable memory)	15:37
SpamapS	corvus: and he was like "shuuttt uuuup" and I was like "whaaatever". ;)	15:37
fungi	corvus: ahh, okay	15:38
corvus	SpamapS: totally krad talk.	15:38
SpamapS	bruh, do you even zuul?	15:38
pabelanger	SpamapS: corvus: I'd totally come to a talk about our irc discussions, and the solutions that came from them :)	15:39
SpamapS	The more interesting work is where people keep reacting violently to Zuul's model and asking "what do we do when the build breaks?" ;)	15:39
clarkb	SpamapS: semi related elsewhere I saw comments about "lets just merge this because we can wait around to fix the zuul gate"	15:39
SpamapS	clarkb:can't?	15:40
clarkb	er ya	15:40
SpamapS	:)	15:40
clarkb	basically they didn't understand that you can't merge unless the gate passes	15:40
clarkb	it is a learning experience for many	15:40
SpamapS	Yeah luckily our gate runs about 15 minutes, and we don't do clean-check, so I haven't had any "skip the gate" conversations as yet.	15:40
pabelanger	clarkb: oh, yah, that happened recently for us too. The hard part right now, is humans still have admin access to repos zuul runs on. It has been difficult asking them to stop doing that workflow	15:41
SpamapS	Oddly enough I also haven't had any "hey it's amazing master always works" compliments yet. Ungrateful devs.	15:41
SpamapS	ofosos: one thing I haven't mentioned yet. We deploy master to our staging environment, but we have a separate branch, called prod, that we use for production. The staging environment is used as a buffer in case there are things people want to visually verify, etc.	15:42
ofosos	SpamapS: interesting detail	15:43
SpamapS	And lately I've had to yell at people to stop doing manual API testing there and write real tests for the gate. "SHIFT LEFT!" I scream into the void.	15:43
ofosos	SpamapS: do you do canary or blue/green on any of the Terraform stuff?	15:43
SpamapS	Nearly every failure we've had in deploy can be traced to things like "Wrong API key in prod config." or "Visual/Legal-review-needed detail missed in staging."	15:44
pabelanger	clarkb: could I get a review on https://review.opendev.org/661866/ wouldn't mind seeing if we could land that	15:44
*** tosky has quit IRC		15:44
SpamapS	ofosos:no, we just apply and slurp outputs.	15:45
ofosos	We're in the CloudFormation 'rollback failed nightmare'-camp	15:45
clarkb	pabelanger: yes	15:46
SpamapS	I refuse to use CloudFormation. There are real reasons, and also personal reasons, for that. ;)	15:46
clarkb	pabelanger: see my question in #openstack-infra about ansible things if you have a moment too please :)	15:46
ofosos	So no updates, just re-creates. Despite that, we want to roll out VPCs and the like with this, so there are really not going to be updates	15:46
* SpamapS shoots an appreciative but worried glance at the Heat dev team. ;)		15:46
ofosos	It's not a nice experience, I agree with that.	15:47
SpamapS	ofosos: With terraform we do actually sometimes apply before deploy, and we're talking about storing the plan in git so we don't have surprises.	15:47
SpamapS	But so far none of that has bit us.	15:47
pabelanger	clarkb: replied	15:48
pabelanger	also, relocating network here is terrible	15:48
SpamapS	ofosos: Terraform and CloudFormation are both very very powerful, with emergent behaviors if you don't reign them in.	15:48
SpamapS	IMO nobody should ever use CloudFormation now that Terraform exists.	15:48
fungi	SpamapS: nobody every notices when things are always working. need to occasionally break stuff to get their attention ;)	15:49
SpamapS	We plumb AWS[VPC, EC2, ELB, RDS] -> CloudFlare -> Kubernetes -> StatusCake all with terraform. I can't imagine how much bash/python/garbage code we'd have to write without it.	15:49
clarkb	pabelanger: I had put it on the backburner after the request for a test but now see shrews feels that isn't necessary (I think it wouldn't hurt to have one but also agree seems unnecessary)	15:50
ofosos	I arrived at this company, with the impression that this discussion was already finalized and people still believe that terraform is somehow inferior	15:50
clarkb	SpamapS: do you ipv6 with elb? can you maybe help docker to do the same with docker hub :)	15:51
SpamapS	clarkb: nope, CloudFlare does all our ipv6.	15:51
SpamapS	But AFAIK elbs always get an AAAA and an ipv6.	15:52
fungi	i guess dockerhub needs a cdn	15:52
clarkb	docker hub uses cloudflare to serve the fs layer objects	15:52
SpamapS	so if they're failing to republish the AAAA they're just lazy.	15:52
fungi	huh... i wonder why the dockerhub elb lacks aaaa then	15:52
clarkb	but the index is served behind elb	15:52
fungi	yeah, could be the lazy	15:52
SpamapS	clarkb:weird, I wonder why they wouldn't want to CloudFlare the index.	15:53
clarkb	SpamapS: it leads to fun caching proxy rules	15:53
clarkb	certainly would be easier for us if it was all at a single location	15:53
SpamapS	hm I was wrong, you have to turn on ipv6 on classic ELB	15:55
ofosos	Hmm, I'm wondering on how to do a multi-region promote. Does that make sense to have the promote job run with a delay between regions?	15:57
SpamapS	ofosos:I do a multi-region promote. I think at that point, it's just like any other automated deploy. If you need a delay, do a delay. We do them one after the other.	15:59
ofosos	sounds reasonable	15:59
ofosos	If one of them fails, do you rollback all previous deploys?	16:00
SpamapS	But I actually would really like to hand this all off to Spinnaker and use some of their awesome primitives.	16:00
SpamapS	ofosos:no, we just explode and notify. Some of our stuff self-heals though. The kubernetes deploys for instance do a good job of detecting readiness and not destroying working pods.	16:01
SpamapS	I'm focused more on self-healing than auto-rollback.	16:02
SpamapS	Which is where I want to get Spinnaker canaries involved.	16:02
*** rfolco has quit IRC		16:06
fungi	our situation is probably a lot different, but we're performing more and more full-stack tests by using our deployment ansible playbooks to deploy test copies of sections of the infrastructure in virtual machines and exercise it	16:09
fungi	before we approve, heck before we even review those modifications	16:09
*** rfolco has joined #zuul		16:11
*** electrofelix has quit IRC		16:16
openstackgerrit	Merged zuul/nodepool master: Add error handling when cleaning up resources https://review.opendev.org/661866	16:18
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	16:20
SpamapS	fungi:that's exactly what we do too	16:20
fungi	cool!	16:21
fungi	so it's not just us then	16:21
SpamapS	check does what it can w/o secrets. gate does more with secrets. It's a pretty tight funnel, and most of what makes it through does what the programmer/automator intended.	16:21
SpamapS	For instance, in check, we deploy all of our kubernetes things into a minikube.	16:22
SpamapS	In gate, we can stick them into a real k8s cluster because now we have creds in secrets.	16:22
fungi	we use fake credentials in check/gate and just stand up copies of the additional services we want the service we're testing to interact with	16:22
*** electrofelix has joined #zuul		16:22
SpamapS	fungi:one big difference for us, is that we have ~20 3rd party API's to deal with, so we can't stand up fakes.	16:22
SpamapS	But in gate, we have fake-ish account creds to run tests with.	16:23
ofosos	Hmm, hmm, hmm. The glue of my box is setting...	16:23
fungi	but yeah, that doesn't catch possible differences on long-running persistent systems, and we also eschew proprietary software/services	16:23
fungi	so our free software ideals help us out there	16:23
SpamapS	I swim in a sea of OPP (other peoples programs). ;)	16:23
fungi	we do rely on lots of opp, it's just opflossp	16:24
ofosos	I still like to have some knobs: I'd like to model my playbooks in a way that I can trigger a hot rollback in some easy/general way and I'd like a knob for passing credentials to the promote job. I think I need to build another box to mull this through :)	16:25
SpamapS	ofosos: one way to model this in zuul is to make a fast-track pipeline. I did that at GoDaddy, where certain things would trigger things to run w/o long tests.	16:35
SpamapS	Like, a label on a PR, or a specific hot-fix branch.	16:35
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	16:39
ofosos	SpamapS: what do you think about 'a thing' that uses AWS cognito to log you into a site and then passes your role/permissions/credentials to a job to execute with? I think that would be fairly easy to realize.	16:42
SpamapS	ofosos: I think that might be fine. I choose to run everything through git though. ;)	16:46
SpamapS	I am immune to eye rolls though.	16:46
ofosos	Yeah, but then you end up with a) a lot of roles that can do a lot of stuff, in sum allowing zuul to do everything; or b) outright allowing zuul to do everything. Both are kind of bad, IMO.	16:50
ofosos	This could be part of a promote pipeline, i.e. the pipeline requests credentials from a credential provider.	16:51
SpamapS	ofosos:the roles aren't what enables things, the secrets are. And those should be tightly coupled to whatever lets people approve/merge changes.	16:51
fungi	zuul's secrets model has had tons of thought put into its design specifically to allow these use cases, so that you don't need to have a separate secrets store for your jobs to authenticate to and fetch from	16:52
ofosos	Having fixed credentials is kind of bad. I'd prefer to operate with temporary credentials. Irrespective of how much brainpower people put into managing these fixed credentials.	16:55
fungi	ahh, so you have some separate system create new credentials on the fly and authorize them in the relevant services and hand those to jobs and then revoke them when the build completes?	16:56
fungi	i guess it just depends on where you put that trusted central authority. in our case zuul is our trusted central authority for such purposes	16:57
ofosos	Yep, that's right. But mostly they'll just time out.	16:57
fungi	also the job needs some way to authenticate to the credential broker, so the fixed credentials it uses to authenticate to the credential broker becomes the new authority, in effect	16:59
fungi	i suppose it does though give you the ability to insta-revoke access from zuul jobs to all systems by just revoking the credentials it uses to interface with the credential broker, rather than needing to individually revoke various credentials which were in the job secrets	17:03
ofosos	The workflow would be: Job requests credentials, user logs into web ui and grants access with their credentials, job is passed a set of credentials which don't renew. Ideally the job could sign that request, so the user is presented with information about which system is requesting their permissions.	17:03
ofosos	This is for prod, for test, we'll likely use a more relaxed policy.	17:04
fungi	do the user's credentials also time out? otherwise what's to prevent the system from caching/saving and reusing them?	17:08
fungi	i suppose otp could work there	17:08
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	17:08
ofosos	We'll just generate temporary credentials based on the users permission level.	17:11
fungi	and then the user enters those?	17:15
fungi	or the user is entering durable credentials which the job then uses to obtain temporary credentials?	17:15
ofosos	Nope, the user is authenticated, we check his permissions and generate appropriate temporary credentials based on his permission level to pass on to the job.	17:15
ofosos	The user has to authenticate in some way. I think with our setup, this will likely be oauth.	17:16
ofosos	The `thing' (service) just makes sure that the job never has durable credentials.	17:16
ofosos	I'll try to build something on the weekend, so I can demo it. Maybe that'll be easier than just text. :)	17:18
ofosos	I'll be afk for the rest of the day, need to enjoy the public holiday some more. Very enjoyable discussion :)	17:19
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Annotate builds with event id https://review.opendev.org/658895	17:28
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Log github requests with annotated events https://review.opendev.org/660800	17:28
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Annotate logs around build completion and cancellation https://review.opendev.org/660806	17:28
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Annotate logs around build states https://review.opendev.org/661489	17:28
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Annotate logs around reporting https://review.opendev.org/661490	17:28
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Annotate logs around finished builds https://review.opendev.org/661491	17:28
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	17:34
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper https://review.opendev.org/661114	17:34
Shrews	i'm sad zuul tests seem to lack an equivalent of nodepool's CLI tests	17:39
SpamapS	ofosos: I don't know why you'd want a user to be the gateway for credentials. We make policy the gateway. If a job has been granted permissions, it can go forward. We scope down when API's allow it, like Amazon's sts, where we make a token that only is valid for the life of the job, but a human doesn't do that, a trusted job does it.	17:40
SpamapS	Also we only allow a narrow team of individuals to commit things to the prod branch, so if it's in prod, it has already had a human authorization.	17:41
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	17:44
corvus	Shrews: the rpcs are tested, and so far the client has been a thin enough layer on the rpcs that if they work, the cli should too. that's in test_scheduler.py (of course, because they're old and everything is there) -- eg test_autohold	17:47
corvus	Shrews: there is a test_client.py which seems to have one cli executable test	17:47
*** electrofelix has quit IRC		17:48
corvus	could probably combine the two to make a new test if you didn't feel the rpc-only test was sufficient	17:48
Shrews	corvus: yep, i'm aware of that one. it lacks the framework for testing output though, similar to https://opendev.org/zuul/nodepool/src/branch/master/nodepool/tests/unit/test_commands.py	17:49
*** electrofelix has joined #zuul		17:50
tobiash	tristanC: I added some thoughts to https://review.opendev.org/590092	17:59
tobiash	corvus: I'd be curious what you think ^	18:01
*** electrofelix has quit IRC		18:03
corvus	tobiash: i think i agree about forwarding zuul_return. i'm not sure about the rest right now.	18:06
tobiash	corvus: thanks	18:10
*** jpena is now known as jpena\|off		18:13
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	18:17
*** pcaruana has joined #zuul		18:24
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	18:38
*** nickx-intel has joined #zuul		18:54
nickx-intel	how do I inherit variables from main.yaml > main-task role > leaf-task role ?	18:55
nickx-intel	weird	19:02
nickx-intel	it's erroring because it's finding variablename in - name: but variablename isn't noted by {{}}	19:03
nickx-intel	can't I escape variablename so that it doesn't try to parse - name: "stuff"	19:03
pabelanger	nickx-intel: where is your variable?	19:03
nickx-intel	pabelanger, I have it declared in run.yaml variables	19:04
pabelanger	nickx-intel: you can look to inventory file for it	19:05
pabelanger	it depends how you are setting it	19:05
nickx-intel	hmm	19:05
pabelanger	if you are using set_facts, they don't persist across ansible-playbook runs	19:05
nickx-intel	does run.yaml vars: use set_facts?	19:06
nickx-intel	implicitly?	19:06
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper https://review.opendev.org/661114	19:06
pabelanger	nickx-intel: what is run.yaml?	19:06
pabelanger	is that a pre-run / run / post-run playbook?	19:07
pabelanger	in your zuul job	19:07
nickx-intel	it's a run playbook	19:07
pabelanger	nickx-intel: how are you setting the fact? It would only be set_fact, if you called that task	19:08
pabelanger	other wise, if a zuul job variable, that will be stored in the inventory file	19:08
nickx-intel	pabelanger, setting implicitly? idk? it's not an explicitly defined variable assignment. like. it just does like this,	19:10
nickx-intel	vars:	19:10
nickx-intel	key: value	19:10
nickx-intel	key2: value2	19:10
pabelanger	yah, if that is in your play, that should work	19:10
nickx-intel	does this implicitly call set_fact?	19:10
pabelanger	no	19:10
pabelanger	nickx-intel: https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html explains all the fun with variables in ansible	19:11
nickx-intel	do I need to call branch_role(vars) leaf_role(vars) or something?	19:11
pabelanger	nope	19:11
pabelanger	you should be able to call	19:11
pabelanger	task: shell: "echo {{ key }}"	19:12
pabelanger	and it works	19:12
pabelanger	in the same play	19:12
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	19:13
nickx-intel	I'm trying to implement include_role: leaf_role	19:13
nickx-intel	but my leaf_role is dumb	19:13
pabelanger	you might need to pass vars into include_role, see: https://docs.ansible.com/ansible/latest/modules/include_role_module.html	19:14
pabelanger	you likely hitting a scoping issue	19:14
pabelanger	Pass variables to role example in link above	19:14
nickx-intel	yeah that's my apparent position pabelanger, vis branch_role(vars) leaf_role(vars) :)	19:14
nickx-intel	I'll dig more after lunch, I think this is sufficient, thank you pabelanger for confirming my suspicion	19:15
nickx-intel	I'll post my fix after I fix lol :)	19:16
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	19:24
*** tosky has joined #zuul		19:38
*** rlandy is now known as rlandy\|brb		19:39
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	19:52
*** rlandy\|brb is now known as rlandy		20:08
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	20:12
openstackgerrit	Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316	20:16
clarkb	tristanC: one thing I notice about my change at ^ is that it added itnegrity shas to the packages in my yarn.lock update but we don't seem to have those on the other locked packages	20:18
clarkb	tristanC: does that mean I did something wrong or will it add those optomistically?	20:19
clarkb	reading on the internet seems like older yarn didn't add those and newer yarn does. Maybe the version of yarn used to generate the existing lock file was older?	20:27
clarkb	seems like checking package hashes is a good thing so I don't think I'll try to undo it unless someone says we need to	20:27
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	20:32
*** pcaruana has quit IRC		21:18
pabelanger	tobiash: have you seen this error before with github? http://paste.openstack.org/show/752331/	21:27
openstackgerrit	Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316	21:27
openstackgerrit	Clark Boylan proposed zuul/zuul master: Use nodejs v10 in testing https://review.opendev.org/662339	21:27
clarkb	the axios change failed an in debugging it noticed we use nodejs 6, 8 and 10 in different jobs	21:30
clarkb	I've tried to make it nodejs 10 across the board in hopes that also fixes my axios problem	21:30
clarkb	but I think we should use nodejs10 regardless	21:30
fungi	likely so	21:31
clarkb	ok new nodejs doesn't fix the axios change issue	21:38
* clarkb generates new yarn.lock from scratch		21:38
clarkb	after rebuilding venv so that it has nodejs 10 in it	21:38
fungi	web browsers are so last decade anyway	21:40
pabelanger	tobiash: it looks like some PR reviews, don't have commit_id: https://api.github.com/repos/ansible/ansible/pulls/45469/reviews	21:43
pabelanger	but I don't know why	21:44
openstackgerrit	Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316	21:45
pabelanger	https://github.com/ansible/ansible/pull/45469/files/1feaf0f2df238cf6788c65c80f08e655891091f6	21:45
pabelanger	looks to be deleted?	21:45
pabelanger	jlk: when you have spare cycles, I'd be interested in what you think we need to do about pull reviews missing a commit_id, see pb above	21:46
clarkb	pabelanger: maybe that happens if you do a rebase and replace the old commits?	21:46
clarkb	github has in the past not been great about keeping that data around	21:46
pabelanger	clarkb: maybe	21:47
clarkb	it does keep diff contexts now but last I checked the commits are gone	21:47
clarkb	and it is the first 2 comments that don't have it in this case whihc would fit under that I think	21:47
pabelanger	but looks like we need to update github3.py, because https://github.com/sigmavirus24/github3.py/blob/master/src/github3/pulls.py#L961 is where it is failing	21:47
pabelanger	not sure what we should do in that case	21:48
clarkb	pabelanger: ya seems like it	21:48
pabelanger	I'm not even sure what we are using pullreviews for right now	21:50
pabelanger	I guess for pipeline trigger	21:51
pabelanger	so, in our case, we likely don't care about commit_id	21:51
clarkb	maybe not? if you trigger approvals or rechecks and expect the commit id to identify what to test we might, but I think we always use the current state of the PR HEAD so it is similar to gerrit in that way	21:53
clarkb	ok someone smarter than me will have to figure out the axios bump. The other change https://review.opendev.org/662339 is good to go I expect	22:01
jlk	pabelanger: My thought is that if it's missing a commit_id it gets discarded. But _also_ this looks like a bug in github3.py; always assuming there is a commit_id. That's probably my code.	22:06
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: registry test job https://review.opendev.org/661327	22:21
pabelanger	clarkb: jlk: my 2min fix: https://github.com/sigmavirus24/github3.py/pull/944	22:27
pabelanger	will look at tests in a bit to see if we need to add coverage	22:27
jlk	alrighty. I think there might be, but again I wrote most of that so it's possible I didn't do it right.	22:27
jlk	I'm asking internally about this. It should be documented.	22:27
pabelanger	cool, thanks	22:27
clarkb	pabelanger: we may need to update zuul to check for a None commit_id after taht goes in but that is probably the right approach	22:28
jlk	Yes, if our assumption is correct (a review for a commit that was force pushed out of the branch) then the proper thing to do is toss the review.	22:29
pabelanger	clarkb: yah, seems like a good idea	22:29
pabelanger	doing that patch now	22:30
openstackgerrit	Paul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete https://review.opendev.org/662347	22:35
pabelanger	clarkb: jlk: believe that is what you are suggesting^	22:35
openstackgerrit	Paul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete https://review.opendev.org/662347	22:37
pabelanger	heh	22:38
pabelanger	zuul can't seem to merge depends-on on ^	22:38
pabelanger	let me remove to confirm	22:38
openstackgerrit	Paul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete https://review.opendev.org/662347	22:38
pabelanger	HA	22:45
pabelanger	clarkb: zuul.o.o is getting the same commit_id exception on that review	22:45
jlk	odd	22:46
pabelanger	that is why there is a merge failure it seems	22:46
jlk	oh, because it was trying to see if that referenced PR is mergeable yet, by looking at reviews?	22:46
pabelanger	that is what I'm looking at now	22:46
pabelanger	oh, maybe it was another event from github	22:48
pabelanger	I'll have to dig later on zuul.o.o	22:48
tristanC	clarkb: that should be fine, perhaps you need to also bump (or removed) the yarn version from the lock/packages.json	22:56
clarkb	tristanC: it did bump it but is still failing	23:00
tristanC	clarkb: i meant in the packages.json, though i don't know if yarn should install itself or if it's safe to use a global one	23:09
clarkb	ah	23:11
tristanC	clarkb: and it seems like the yarn.lock change bump versions for un-pinned dependencies like eslint-plugin-react (from 7.11 to 7.13)	23:11
tristanC	which may not be compatible with the pinned one like react-scripts 1.14	23:11
tristanC	clarkb: perhaps we should try to rebase on https://review.opendev.org/659991	23:11
openstackgerrit	Clark Boylan proposed zuul/zuul master: Update axios version and yarn.lock https://review.opendev.org/662316	23:13
clarkb	is that what you mean about the yarn versions?	23:13
clarkb	and ya wouldn't surprise me if we need to update other things and so basing it on that revert might be the way to go	23:13
clarkb	or update the revert to update axios	23:13
*** ianychoi has quit IRC		23:13
*** rlandy has quit IRC		23:16
*** panda\|ruck has quit IRC		23:22
*** panda has joined #zuul		23:23
*** tosky has quit IRC		23:33
*** tjgresha has joined #zuul		23:50
*** tjgresha has quit IRC		23:55
*** tjgresha has joined #zuul		23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!