Tuesday, 2019-09-17

*** jamesmcarthur has joined #zuul		00:21
*** jamesmcarthur has quit IRC		00:23
*** jamesmcarthur has joined #zuul		00:23
*** jamesmcarthur has quit IRC		01:05
*** igordc has quit IRC		01:08
*** mattw4 has quit IRC		01:09
ianw	i'm coming to the conclusion there really is a problem with the log streamer when ansible is running under python3 ... do we really test that in production?	01:10
ianw	i think even on our bionic nodes, we're not setting ansible_python_interpreter	01:10
clarkb	we set it to 2 iirc	01:11
ianw	it works locally testing zuul_stream ... but reliably fails https://review.opendev.org/#/c/682275 :/	01:25
*** Goneri has quit IRC		01:31
*** rfolco has quit IRC		02:28
*** roman_g has quit IRC		02:33
*** jamesmcarthur has joined #zuul		02:38
*** bhavikdbavishi has joined #zuul		02:43
SpamapS	ianw: I use python3 exclusively in my setup. What's the problem?	02:43
ianw	SpamapS: so you set python-path on your dib nodes to "/usr/bin/python3" as well?	02:44
SpamapS	I don't have "dib nodes" ... I'm on AWS with packer-built AMI's.	02:45
*** bhavikdbavishi1 has joined #zuul		02:45
SpamapS	I've been doing this a long time, I set ansible_python_interpreter=/usr/bin/python3 in my site variables.	02:46
SpamapS	I think I did this before nodepool had a facility for that.	02:46
ianw	hrm, well it's definitely failing in the gate remote tests, and i think what's happening is the streamer plugin is somehow failing	02:47
SpamapS	And my only working OS is Ubuntu 18.04	02:47
*** bhavikdbavishi has quit IRC		02:47
ianw	you get the last task output, then http://paste.openstack.org/show/777058/	02:47
*** bhavikdbavishi1 is now known as bhavikdbavishi		02:47
ianw	Ansible output terminated	02:48
SpamapS	Hm, let me see	02:48
SpamapS	in the executor?	02:48
ianw	you can see the failure in https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_539/682275/2/check/zuul-tox-remote/5391677/testr_results.html.gz	02:49
SpamapS	I see "Ansible output terminated" at the end of every job's ansible summary.	02:49
SpamapS	but there's no traceback or anything	02:49
ianw	sorry, i think the error is more "[Zuul] Log Stream did not terminate"	02:50
*** persia has quit IRC		02:50
ianw	the the job aborts	02:51
SpamapS	I do see that now and then	02:51
ianw	i think that means the streaming callback plugin died, somehow ... but figuring out how is currently how i'm stumped :)	02:52
SpamapS	ahh yeah, perhaps we need to wrap it in a try/except that writes the exception into a tempfile.	02:53
ianw	hrrm, i could wrap all functions in a decorator for that ...	02:55
*** persia has joined #zuul		02:56
*** jamesmcarthur has quit IRC		02:57
SpamapS	Yeah I guess since it's a plugin that's the way you'd have ot do it.	03:03
ianw	http://paste.openstack.org/show/777059/ ... something seems to be going bananas	03:17
ianw	constantly forking and "ansible_zuul_console_payload_69fl742k" is somehow involved ... this seems to suggest "zuul_console:" somehow :/	03:18
*** persia has quit IRC		03:18
*** persia has joined #zuul		03:24
*** jamesmcarthur has joined #zuul		03:56
*** jamesmcarthur has joined #zuul		03:57
ianw	i think that might be a red herring ... the console streamer is trying to open a file that never appears maybe	04:00
*** jamesmcarthur has quit IRC		04:51
*** bolg has joined #zuul		05:15
*** pcaruana has joined #zuul		05:16
*** jamesmcarthur has joined #zuul		05:21
*** pcaruana has quit IRC		05:29
*** sshnaidm\|afk is now known as sshnaidm\|pto		05:35
*** AJaeger has quit IRC		05:52
*** sanjayu_ has joined #zuul		05:54
*** spsurya has joined #zuul		05:55
*** AJaeger has joined #zuul		05:56
*** sanjayu_ has quit IRC		06:00
*** saneax has joined #zuul		06:01
*** jamesmcarthur has quit IRC		06:10
*** themroc has joined #zuul		06:27
*** jamesmcarthur has joined #zuul		06:32
*** avass has joined #zuul		06:36
*** jamesmcarthur has quit IRC		06:37
*** jamesmcarthur has joined #zuul		06:45
openstackgerrit	Ian Wienand proposed zuul/zuul master: [dnm] testing python3 ansible https://review.opendev.org/682556	06:57
*** jamesmcarthur has quit IRC		07:17
*** tosky has joined #zuul		07:27
*** armstrongs has joined #zuul		07:27
*** jangutter has joined #zuul		07:28
*** saneax has quit IRC		07:28
*** armstrongs has quit IRC		07:37
*** jamesmcarthur has joined #zuul		07:44
*** bhavikdbavishi has quit IRC		07:44
*** jpena\|off is now known as jpena		07:46
*** jamesmcarthur has quit IRC		07:50
*** jamesmcarthur has joined #zuul		07:52
ianw	https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cfd/682556/1/check/zuul-tox-remote/cfd83a2/testr_results.html.gz	07:53
ianw	mordred: ^ if you could figure out any clues as to why this seemingly small change to python3 leads to ^ https://review.opendev.org/#/c/682556/1/tests/base.py which appears to me to be an issue with the streaming?	07:54
*** hashar has joined #zuul		07:58
mordred	ianw: 2019-09-17 07:19:04,069 zuul.AnsibleJob.output seems to be the last chunk that ran - and I don't know if that traceback is expected	08:00
mordred	ianw: maybe there's a behavior/traceback change under 3 that's not getting caught properly? that test is testing if something doesn't exist - so maybe something that we're doing is not handling an error properly when running under 3?	08:01
mordred	but - otherwise, no, I don't have an immediate thought	08:01
ianw	mordred: yeah, i just can't find it :(	08:02
ianw	2019-09-17 07:20:04.902649 \| ubuntu-bionic \| b"2019-09-17 07:19:34,105 zuul.AnsibleJob.output DEBUG [e: 32de590007fe490da9e7b9cc89391a90] [build: c1a1bde54c19483a97a26774f9add01f] Ansible output: b'[Zuul] Log Stream did not terminate'"	08:02
ianw	I think that is probably a problem?	08:02
mordred	maybe?	08:02
ianw	dunno, throwing in the towel on this one for today, anyway	08:05
ianw	the bigger problem is that i want to use python-path python3 for fedora 30 -> https://review.opendev.org/682569	08:06
ianw	i think that for ansible >=2.8 zuul should be able to just leave ansilbe up to it's own devices on this one (https://review.opendev.org/682275) but have to figure out what's going on with this failure first	08:07
mordred	ianw: you've got all the fun ones :)	08:11
*** jamesmcarthur has quit IRC		08:15
*** jamesmcarthur has joined #zuul		08:16
*** igordc has joined #zuul		08:20
*** bhavikdbavishi has joined #zuul		08:25
*** igordc has quit IRC		08:28
*** noorul has joined #zuul		08:31
noorul	hi	08:31
noorul	Is there a way to define a project dependent on another project?	08:32
SpamapS	noorul: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects	08:37
SpamapS	noorul: but that's job->project.	08:37
SpamapS	so it can be inferred by project->job->required-project	08:38
openstackgerrit	Merged zuul/zuul-jobs master: Add a netconsole role https://review.opendev.org/680901	08:38
noorul	SpamapS: Thanks! Which branch will it checkout?	08:39
avass	noorul: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout	08:40
noorul	avass: Thank you! Is there an example?	08:41
avass	noorul: So the same branch as the branch triggering the job unless override-checkout is specified	08:41
noorul	avass: What happens if that branch does not exist in the required project?	08:41
avass	noorul: Not sure, I guess the job would fail	08:42
noorul	Oops!	08:42
noorul	I am thinking how can I do the following	08:43
noorul	I have two repositories repo1 and repo2	08:43
noorul	both of them has release branches	08:43
noorul	rel_1.1.1	08:43
noorul	Now I have a private branch br1 and using that I raise a PR	08:44
noorul	If I defined the repo2 and required project for the job	08:44
noorul	Any idea what will happen?	08:44
ianw	mordred: when you say "not handling an error properly" do you mean in the streamer, or somewhere else in zuul that might decide to abort the job? http://paste.openstack.org/show/777068/ is what i see trying with local testing, but i can't make the streamer fail :/	08:46
avass	noorul: I guess that it would try to checkout your private branch on both repositories unless override-checkout attribute is specified	08:48
avass	noorul: But I'm not sure exactly how it works since we're not using it yet :)	08:49
avass	noorul: unless I'm reading this wrong	08:51
avass	noorul: Seems strange to me that it would try to checkout the same ref for both projects since you can't guarantee that the same ref exists in both repos.	08:55
mordred	ianw: yeah - I'm honestly not sure what I mean - I'm a bit grasping at straws there	08:58
mordred	ianw: oh yeah - hrm. both are showing a traceback	09:00
avass	mordred: could you shine some light on how this works? https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout	09:02
avass	mordred: actually I mean this: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout	09:02
avass	mordred: which branch/ref does it checkout by default? master?	09:03
mordred	well - by default it checks out the branch matching the target, and if it cant' find that, it'll fall back to master. if override-checkout is defined, it'll use that	09:04
avass	mordred: ah, that makes sense	09:04
mordred	in the example from noorul above, the private branch br1 isn't relevant if it's the source of the PR	09:04
mordred	assuming the PR is targetting one of the regular shared long-lived branches	09:05
noorul	mordred: In my example I want repo2's rel_1.1.1 to be checked out	09:06
noorul	mordred: But looks like master will be checked out	09:06
mordred	noorul: if you are submitting the PR ato repo1's rel_1.1.1 branch, zuul should also check out rel_1.1.1 of repo2	09:08
noorul	So, if br1 exists in repo2, it will be checkout out, otherwise rel_1.1.1	09:10
noorul	Is my understanding correct?	09:10
mordred	noorul: ah - no - sorry, I misunderstood what you meant by private branch. you mean you have a branch, br1, on the main shared repo1 and you are submitting PRs to that branch	09:15
mordred	am I understanding that right?	09:15
noorul	Not exactly	09:16
noorul	I submitting a PR from br1 to rel_1.1.1 of repo1	09:16
mordred	ah - awesome	09:16
noorul	and say I have in the job required_projects - repo2	09:16
mordred	in that case, br1 shouldn't play into the decision making from zuul at all	09:16
mordred	it's about which branch you are submitting a change to - so since you are submitting from br1 to rel_1.1.1 of repo1- then if you add repo2 into the required_projects, zuul should default to checking out rel_1.1.1 of repo2	09:17
noorul	I see	09:18
mordred	so in this case you should not need an override-checkout and zuul should do the right thing	09:18
noorul	What is the use case of override-checkout?	09:18
*** pcaruana has joined #zuul		09:19
mordred	in case the repos don't share a common structure. for instance, I have a job in openstacksdk that tests against ansible and has required-projects: github.com/ansible/ansible ... in this case, I want to test stable/rocky of openstacksdk against stable-2.6 of ansible - so I use override-checkout to tell zuul about that	09:20
mordred	you could also use if it you wanted to define an additional job that tested your rel_1.1.1 of repo1 against master of of repo2 - to verify that a change worked both with a release and had future upwards compat, for instance	09:21
noorul	does it support wildcard ?	09:22
mordred	no - only direct values. however- the repos all have all of their branches in correct state, so if you need to get more clever, you can always do a git checkout in a job	09:24
*** saneax has joined #zuul		09:24
mordred	https://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L192-L229 <-- this is an example of a job where we want stable-2.15 of a bunch of gerrit repos on patches to master of opendev/system-config	09:24
noorul	I see	09:25
noorul	thanks for that example	09:25
noorul	I have a very simple job http://paste.openstack.org/show/777072/	09:25
noorul	but it fails	09:25
noorul	/bin/sh: 1: ./run_tests.sh: not found	09:25
*** sanjayu_ has joined #zuul		09:25
noorul	But the file exists	09:26
mordred	you need to change directories to your repo	09:26
noorul	inside run_tests.sh?	09:26
mordred	no - in the job - that shell command is going to be running with cwd of /home/zuul	09:27
mordred	but your repo will be in something like /home/zuul/src/opendev.org/openstack/repo1	09:27
noorul	Hmm	09:27
mordred	(repos are put in golang format on disk inside of the src dir)	09:27
noorul	Is there an example?	09:28
mordred	so - http://paste.openstack.org/show/777073/	09:28
avass	noorul: you probably want to put a chdir: {{ zuul.project.src_dir }} on the shell command	09:28
*** saneax has quit IRC		09:29
mordred	https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/gerrit/repos.yaml	09:29
mordred	yah	09:29
mordred	{{ zuul.project.src_dir }} is great for this case	09:29
*** roman_g has joined #zuul		09:30
*** jamesmcarthur has quit IRC		09:31
*** jamesmcarthur has joined #zuul		09:32
*** noorul has quit IRC		09:42
openstackgerrit	Ian Wienand proposed zuul/zuul master: [dnm] testing python3 ansible https://review.opendev.org/682556	09:50
*** noorul has joined #zuul		09:54
noorul	mordred: http://paste.openstack.org/show/777075/	09:54
avass	noorul: found unacceptable key (unhashable type: 'AnsibleMapping')	10:03
avass	noorul: https://docs.ansible.com/ansible/2.5/user_guide/playbooks_variables.html#hey-wait-a-yaml-gotcha	10:03
avass	noorul: The jinja expression needs to be put in quotes otherwise ansible will think it's a yaml dictionary	10:07
*** jamesmcarthur has quit IRC		10:07
*** pcaruana has quit IRC		10:07
avass	noorul: But only if a value is started with an expression	10:07
avass	So it should have been chdir: "{{ zuul.project.src_dir }}", my fault :)	10:08
*** bhavikdbavishi has quit IRC		10:14
noorul	avass: Got it	10:16
*** recheck has quit IRC		10:17
noorul	Is it possible to define ansible role in non-trusted project?	10:17
*** recheck has joined #zuul		10:18
*** recheck has quit IRC		10:22
*** recheck has joined #zuul		10:23
mordred	absolutely	10:24
*** recheck has quit IRC		10:24
*** recheck has joined #zuul		10:25
openstackgerrit	Ian Wienand proposed zuul/zuul master: [dnm] testing python3 ansible https://review.opendev.org/682556	10:27
*** recheck has quit IRC		10:28
*** recheck has joined #zuul		10:29
*** recheck has quit IRC		10:31
*** recheck has joined #zuul		10:31
*** noorul has quit IRC		10:34
*** recheck has quit IRC		10:37
*** recheck has joined #zuul		10:38
*** jamesmcarthur has joined #zuul		10:40
*** recheck has quit IRC		10:41
*** recheck has joined #zuul		10:41
*** hashar has quit IRC		10:44
*** pcaruana has joined #zuul		10:55
*** noorul has joined #zuul		11:00
*** avass has quit IRC		11:07
*** hashar has joined #zuul		11:09
*** noorul has quit IRC		11:11
*** jamesmcarthur has quit IRC		11:12
*** jpena is now known as jpena\|lunch		11:17
*** panda is now known as panda\|ruck		11:41
*** sanjayu_ has quit IRC		11:44
*** gtema_ has joined #zuul		11:53
*** gtema_ has quit IRC		11:56
*** bhavikdbavishi has joined #zuul		11:59
*** rfolco has joined #zuul		12:04
*** jangutter_ has joined #zuul		12:05
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557	12:07
*** jangutter has quit IRC		12:08
*** jamesmcarthur has joined #zuul		12:10
*** jamesmcarthur has quit IRC		12:10
*** jamesmcarthur_ has joined #zuul		12:10
*** jangutter_ has quit IRC		12:11
*** rlandy has joined #zuul		12:18
*** bhavikdbavishi1 has joined #zuul		12:22
*** gtema_ has joined #zuul		12:23
*** bhavikdbavishi has quit IRC		12:24
*** bhavikdbavishi1 is now known as bhavikdbavishi		12:24
*** jamesmcarthur_ has quit IRC		12:26
*** pcaruana has quit IRC		12:27
*** openstackstatus has quit IRC		12:28
*** openstack has joined #zuul		12:32
*** ChanServ sets mode: +o openstack		12:32
*** sanjayu_ has joined #zuul		12:33
*** AJaeger has quit IRC		12:34
*** AJaeger has joined #zuul		12:36
*** jpena\|lunch is now known as jpena		12:37
*** Goneri has joined #zuul		12:39
*** bhavikdbavishi has quit IRC		12:46
*** fdegir has quit IRC		12:47
*** fdegir has joined #zuul		12:48
*** mattymo has joined #zuul		12:53
*** gtema_ has quit IRC		12:56
*** pcaruana has joined #zuul		12:59
mattymo	Anyone familiar with nodepool that could help me out? I can get nodepool to do rhel7 registration just fine, but it unregisters at the end of build. That's okay. I want now to do RH registration when nodepool-launcher launches an openstack instance	12:59
mattymo	or is this something beyond nodepool's scope?	12:59
pabelanger	mattymo: yes, you need'd to setup a zuul pre-run playbook to do that	13:03
pabelanger	nodepool no longer has the ability to modify a node at launch time, only zuul can	13:03
mordred	what an interesting use case ...	13:03
pabelanger	mattymo: other option, is create some sort of boot script, that does it when the node first launches	13:04
pabelanger	we do that tody with some dns things in opendev, and for ansible-network setting up network appliance config	13:04
*** gtema_ has joined #zuul		13:07
mordred	pabelanger: this seems like a good description of a type of action that might (or might not) be worth pondering. because the activity in question is tied to a nodeset / label and isn't really job specific. I don't think I'd pondered the rhel activation use case before (I mean - obviously can do with a pre-run - it's just an interesting case to consider)	13:07
fungi	it also could be the case that you want to restrict to some specific maximum number of rhel nodes running simultaneously because you only have a certain number of licenses? if so that starts to look a lot like a (perhaps label-specific) quota	13:09
mattymo	pabelanger, is that dns boot script on a public git repo?	13:10
fungi	i wonder if a nodepool driver shim might be a way to solve it	13:10
pabelanger	mordred: yup, there is likey a few network appliance related things we can lump into that too. X that needs to be accomplish to finsh image build process. For now, we solved that with pre-run jobs in zuul, but is complicated to control which playbooks run via hosts	13:11
pabelanger	mattymo: https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/nodepool-base/finalise.d/89-unbound#L138	13:13
*** bhavikdbavishi has joined #zuul		13:13
*** brendangalloway has joined #zuul		13:14
tristanC	corvus: is it ok if I +3 the pagure patches from fbo you already +2?	13:14
mattymo	thankfully in my case I have plenty of licenses	13:14
mattymo	I just don't want creds to ever get stored on the target host	13:14
mattymo	but the way my deployment runs, ansible runs only on the target host	13:15
brendangalloway	I'm writing a job that redeploys a static host (by triggering a pxe boot) in the middle	13:19
brendangalloway	However, I'm getting a failure on wait_for_reconnect timing out even though the host has come back up after the reinstall	13:20
pabelanger	mattymo: yah, in that case, might be better to have zuul do that step. So, you can store that data as a secret in zuul	13:20
brendangalloway	Running the ansible role outside of zuul succeeds, and I am able to reestablish a connection and continue the job	13:20
pabelanger	then hope (which I haven't figured out or even tested) and job doesn't try to leak the license info	13:20
brendangalloway	Any suggestions on debugging the state of the executor at the time of the failure to figure out what is going wrong?	13:22
pabelanger	brendangalloway: I'm not sure what wait_for_reconnect is, that something you created?	13:23
pabelanger	or using wait_for_connection	13:23
brendangalloway	pabelanger: yes, wait_for_connection	13:24
pabelanger	brendangalloway: maybe first use wait_for to ensure SSH port is open?	13:25
pabelanger	we do that today, and works well	13:25
brendangalloway	ok, let me try that	13:25
*** bhavikdbavishi has quit IRC		13:25
*** bhavikdbavishi has joined #zuul		13:26
*** pcaruana has quit IRC		13:28
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event https://review.opendev.org/679938	13:33
pabelanger	mordred: do you remember a time, where zuul might have been running multiple ansible shell tasks, on the same node from nodepool, at the same time? For some reason I thought that was a problem a while back. Basically, I've seen something odd, where we pontentially have shell task process running twice in zuul.a.c	13:37
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event https://review.opendev.org/679938	13:37
pabelanger	I want to say, it was something to do with the version of command that zuul shipped for ansible?	13:38
*** avass has quit IRC		13:47
*** jamesmcarthur has quit IRC		13:47
*** sanjayu_ has quit IRC		13:48
*** pcaruana has joined #zuul		13:59
*** michael-beaver has joined #zuul		14:00
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: Disable rsh synchronize rsync_opts https://review.opendev.org/682657	14:12
mordred	pabelanger: I vaguely remember something something but also don't remember. clarkb ?	14:16
corvus	clarkb: ^ 682657	14:17
*** jamesmcarthur has joined #zuul		14:17
clarkb	corvus: tristanC done	14:18
clarkb	mordred: pabelanger I dont recall	14:18
*** jamesmcarthur has quit IRC		14:22
corvus	tristanC: re pagure, yes -- https://review.opendev.org/679938 was the only thing i was worried about. i just left a comment on that (cc fbo)	14:24
mordred	corvus: I left you a comment on your robot comments patch - possibly just to prove I'm actually reading these patches	14:25
*** jamesmcarthur has joined #zuul		14:27
corvus	mordred: excellent question	14:31
*** jamesmcarthur has quit IRC		14:32
corvus	mordred: i made an answer	14:32
mordred	corvus: cool	14:34
corvus	clarkb: do you think you could take a look soon at the gerrit stack starting at https://review.opendev.org/681132 ?	14:37
clarkb	corvus: yes after my morning meeting I can take a look	14:38
*** jamesmcarthur has joined #zuul		14:38
mordred	clarkb: I think you'll enjoy it	14:38
*** mattymo has quit IRC		14:42
*** jamesmcarthur has quit IRC		14:43
*** bolg has quit IRC		14:47
*** jamesmcarthur has joined #zuul		14:48
*** pcaruana has quit IRC		14:55
openstackgerrit	Merged zuul/zuul master: Disable rsh synchronize rsync_opts https://review.opendev.org/682657	15:08
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Add scheduler max_hold_age config option. https://review.opendev.org/682675	15:28
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Mark nodes as USED when deleting autohold https://review.opendev.org/664060	15:38
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Auto-delete expired autohold requests https://review.opendev.org/663762	15:38
brendangalloway	pabelanger: I'm trying to use wait_for to check that the ssh service has come back up, but I can't execute the task on the executor. Not sure I correctly understood your suggestion	15:39
pabelanger	brendangalloway: do we block it? Which error are you seeing. In our multi node setup, we run it from nested ansible node to check if 2nd node is online	15:41
Shrews	tristanC: I don't really follow your comments on https://review.opendev.org/679057. Why is the tenant needed to get or delete an autohold via the web API? I was following mhu's instructions there and I don't quite get why that's required (obviously not required via CLI).	15:41
brendangalloway	pabelanger: "msg": "Executing local code is prohibited"	15:41
pabelanger	k, so in that case you need to move it to trusted playbook or maybe run from nested ansible	15:42
Shrews	corvus: is https://review.opendev.org/682675 what you had in mind to deal with the node expiration issue?	15:43
brendangalloway	We're running a single node here - we asked previously about issues trying to run jobs with both static and openstack nodes.	15:43
pabelanger	right, so the single node is still online, is that right? it is the static node you are doing a pxe with?	15:44
brendangalloway	so we don't really have an option to spin up another node in the same job to defer the wait_for to	15:46
brendangalloway	the single node is going down during the play and being pxe booted	15:46
tristanC	Shrews: when a tenant REST endpoint is white-label, user doesn't have access to /api/autohold, all there requests are scoped /api/tenant/{user-tenant-name}/	15:47
mordred	why are we blocking wait_for ?	15:47
pabelanger	ah, so yah in that case you need to move the wait_for into a trusted playbook, to run from executor	15:47
brendangalloway	trusted playbooks are only allow to run in a post environment correct?	15:47
mordred	no - they can run anywhere - their content just isn't executed speculatively - so if you propose a change to one, the change has to land before it takes effect	15:48
tristanC	Shrews: thus if we have /api/tenant/{tenant}/autohold to list autoholds (at L1105), then we should have /api/tenant/{tenant}/autohold/{id}	15:48
mordred	but also - I want to see if there is a way we can allow wait_for - because it seems like a sensible thing to want to do	15:48
mordred	brendangalloway: can you try using wait_for_connection instead?	15:50
pabelanger	that didn't work	15:50
brendangalloway	That was my original approach	15:50
mordred	oh. weird	15:50
mordred	k.	15:50
pabelanger	wait_for, was to scan to see if port was open	15:51
clarkb	corvus: https://review.opendev.org/#/c/682487/ is the stack you were asking for review on right?	15:51
pabelanger	then try wait_for_connection	15:51
pabelanger	brendangalloway: next option, would be shell ssh-keyscan / loop	15:51
brendangalloway	but it's getting the 'unable to ssh' error	15:51
pabelanger	but with executor, that is going to be blocked too	15:51
pabelanger	IIRC	15:51
brendangalloway	wait_for being allowed to execute would be ideal	15:51
corvus	clarkb: that's the end, https://review.opendev.org/681132 is the start	15:52
*** gtema_ has quit IRC		15:52
corvus	Shrews: generally yes -- i'll take a detailed look in a few	15:52
*** bolg has joined #zuul		15:52
*** hashar has quit IRC		15:53
*** hashar has joined #zuul		15:54
brendangalloway	pabelanger: That would also still need to executed by trusted or a third party?	15:54
pabelanger	brendangalloway: yah, in this case, you are going to have very limited way to check in untrusted job	15:54
mordred	brendangalloway: yeah - I think allowing wait_for to be used makes sense - it might be tomorrow before I can get enough headspace to dive in to the action plugin exclusions and figure it all out	15:54
pabelanger	what I'd suggest, is figure out how to do the wait check, move that job into trusted, then parent to it from untrusted job	15:55
mordred	pabelanger, brendangalloway: what didn't work with wait_for_connection (curious)	15:55
*** mattw4 has joined #zuul		15:56
brendangalloway	pabelanger: that could be an option. Luckily the reformat is the first role called and once it's working it shouldn't need much changing	15:56
pabelanger	not sure, I'm guessing that some with socket from executor to node?	15:57
brendangalloway	mordred: "msg": "SSH Error: data could not be sent to remote host "<redacted>". Make sure this host can be reached over ssh"	15:57
pabelanger	wait	15:57
pabelanger	so, if a new host is coming online with pxe boot	15:57
pabelanger	it will have new hostkeys	15:57
pabelanger	and zuul-executor hasn't accepted them	15:57
*** hashar has quit IRC		15:58
brendangalloway	same playbook works when I run it from my laptop, and once nodepool sees the node, zuul accessess them just fine	15:58
pabelanger	brendangalloway: are you preserving hostkeys?	15:58
brendangalloway	pabelanger: that's already been solved	15:58
pabelanger	kk	15:58
brendangalloway	yes	15:58
*** hashar has joined #zuul		15:58
pabelanger	I wonder if ssh-agent hasn't timed out or something	15:58
pabelanger	brendangalloway: maybe try using meta reset_connection before wait_for_connection?	15:59
pabelanger	that would cause ansible-playbook to create a new connection again	15:59
brendangalloway	Is there some way I can introduce a delay on that? I'd like to wait a minute or two to make sure the connection is properly down before resetting and waiting for the connection	16:02
pabelanger	wait_for_connection has delay / sleep / timeout settings	16:03
*** noorul has joined #zuul		16:03
pabelanger	same with wait_for	16:03
pabelanger	but you can also use pause task to hardcode a delay too	16:03
brendangalloway	so wait_for_connection with ignore errors, reset connection, wait_for_connection again?	16:04
pabelanger	reboot node, reset connection, wait_for_connection (delay / sleep / timeout)	16:05
mordred	I look away for a second and come back to a very fun scrollback	16:06
brendangalloway	when I call reset connection, will it just drop the connection until the next task? Or will it try reconnect as part of the reset?	16:06
pabelanger	it will reconnect	16:06
pabelanger	on next task	16:06
brendangalloway	sounds perfect, I will test that then	16:06
pabelanger	I'm starting to think, when node is booting up again, something is going on with sshd on server side. And maybe connection is reset or something	16:07
pabelanger	why I like wait_for, is you can look for SSHd headers in connection attempt	16:07
pabelanger	eg: https://github.com/ansible/ansible-zuul-jobs/blob/master/playbooks/ansible-network-appliance-base/pre.yaml#L25	16:07
brendangalloway	we do a normal reboot at other stages in the play and wait_for_connection works fine	16:09
*** hashar has quit IRC		16:10
noorul	Where does ansible push the code to the remote node?	16:12
clarkb	noorul: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/prepare-workspace-git/README.rst that is the role that should be used and typically as an early action of a base job	16:14
corvus	noorul: https://zuul-ci.org/docs/zuul/admin/quick-start.html#configure-a-base-job also has more info -- you might remember doing that part when you ran through the quickstart	16:15
clarkb	corvus: left a question on https://review.opendev.org/#/c/681132/	16:15
corvus	clarkb, noorul: the quickstart uses 'prepare-workspace' -- maybe we should change it to prepare-workspace-git ?	16:15
clarkb	corvus: ++ prepare-workspace-git will handle presence of a cache or no cache	16:16
noorul	clarkb: I am using prepare-workspace	16:16
clarkb	it is more flexible	16:16
corvus	clarkb: yes. Do you want that in the form of a new patchset or followup?	16:17
clarkb	corvus: considering there are already a few moving parts and we likely need a full stack to do anything useful a followup is probably fine	16:17
corvus	clarkb: cool. i'll stage the fix that way and wait for you to finish the stack before pushing it up.	16:18
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API https://review.opendev.org/679057	16:28
*** rlandy has quit IRC		16:30
*** hashar has joined #zuul		16:33
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Remove outdated TODO https://review.opendev.org/682421	16:34
*** rlandy has joined #zuul		16:35
clarkb	corvus: left some thoughts on the big change https://review.opendev.org/#/c/680778	16:35
corvus	heh, i just replied to mordreds comment on that, i'll look at clarkb's now	16:35
*** noorul has quit IRC		16:45
clarkb	corvus: https://review.opendev.org/#/c/681936/ I think I found a bug in that change	16:47
clarkb	(and I -1'd because I think merging it as is would break opendev)	16:47
corvus	clarkb: ++	16:48
*** rfolco is now known as rfolco\|dentist		16:51
SpamapS	TIL that if you have a directory in your roles path that just has a README.rst in it, Ansible will still consider that a "role", and when you depend on that role, it will happily consider it having run successfully by not doing anything at all.	16:54
clarkb	corvus: and comment on https://review.opendev.org/#/c/682487 I think all of my comments but the one on 681936 can be addressed as followups if you prefer	16:54
SpamapS	This seems like.. a less than ideal default.	16:54
*** pcaruana has joined #zuul		16:55
corvus	clarkb: what do you think of my response on that one?	16:57
paladox	corvus robot comments are in 2.15.0. At least i doin't remember anyone adding it in a point release :)	16:57
corvus	paladox: cool, i'll set it to >=2.15.0 then	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Add enqueue reporter action https://review.opendev.org/681132	16:58
paladox	corvus https://github.com/GerritCodeReview/gerrit/commit/3fde7e4e75f4653e4a56e6c38bc7718a3280bd9c#diff-a9ed91a039490d5fed094853d96608fe	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Add no-jobs reporter action https://review.opendev.org/681278	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Add report time to item model https://review.opendev.org/681323	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Add Item.formatStatusUrl https://review.opendev.org/681324	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Add support for the Gerrit checks plugin https://review.opendev.org/680778	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Update gerrit pagination test fixtures https://review.opendev.org/682114	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Support HTTP-only Gerrit https://review.opendev.org/681936	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Add autogenerated tag to Gerrit reviews https://review.opendev.org/682473	16:58
openstackgerrit	James E. Blair proposed zuul/zuul master: Use robot_comments in Gerrit https://review.opendev.org/682487	16:58
corvus	clarkb: those exceeded my own threshold for followups, so i amended the original commits and redid the whole stack	16:58
clarkb	corvus: wfm	16:59
paladox	it's actually in 2.14 based on that commit!	16:59
*** jpena is now known as jpena\|off		16:59
clarkb	corvus: in https://review.opendev.org/#/c/682487/3..4/zuul/driver/gerrit/gerritconnection.py you updated teh > to >= but left the version as 2.15.0 instead of 2.15.16. The rest of the stack lgtm now	17:03
corvus	clarkb: yeah did that based on paladox's comment above	17:04
clarkb	oh /me catches up on irc	17:04
clarkb	aha thanks	17:04
corvus	which came right after i left the response in gerrit, sorry i didn't update	17:04
clarkb	in that case I think the whole stack is good	17:05
Shrews	corvus: been wanting to review that stack, but been dealing with a stack of my own :/	17:06
Shrews	tristanC: you want to review https://review.opendev.org/682675 that deals with the node expiration issue?	17:10
*** brendangalloway has quit IRC		17:13
tristanC	Shrews: left a comment	17:18
Shrews	tristanC: Not sure what you're asking there. That code should act the same if the user supplied 0 or not	17:20
tristanC	Shrews: if the user supplied 0, then the code default to scheduler configuration value	17:21
tristanC	Shrews: e.g. shouldn't we differentiate a supplied 0, explicite no expiration, to the default cli value?	17:21
Shrews	tristanC: oh, well, that brings up a good question. Would we ever want a user supplied value to exceed our zuul's configured max? If we've configured a max of 2 days, would we want to allow something greater than that?	17:25
tristanC	Shrews: at the moment you can, but you would have to use a silly "--hold-expiration 99999999" to ensure a greater value than what is zuul's configured max	17:26
Shrews	tristanC: right. either way, yes, that needs fixed. but i think that needs to be answered before i can fix it properly	17:26
tristanC	Shrews: I guess supplied expiration should never exceed what is configured in zuul	17:27
tristanC	Shrews: or perhaps it could on the cli, but not from the rest endpoint	17:27
Shrews	corvus: after our call, maybe you have an opinion on that ^^^	17:30
*** michael-beaver has quit IRC		17:40
openstackgerrit	Merged zuul/zuul-jobs master: Update the base-roles test to use prepare-workspace-git https://review.opendev.org/680703	17:47
openstackgerrit	Merged zuul/zuul-jobs master: Clean non-bare remote repos https://review.opendev.org/680689	17:47
*** recheck has quit IRC		17:49
*** recheck has joined #zuul		17:53
*** themroc has quit IRC		17:54
*** igordc has joined #zuul		18:16
corvus	Shrews, tristanC: i don't think we want to differentiate cli vs rest -- i believe the cli is expected to move to use rest eventually anyway. the current nodepool setting is a true max -- it can't be exceeded.	18:21
corvus	Shrews, tristanC: max_hold_expiration should probably match that. maybe we want to add another option though, a default? for the situations where you might not want to set a hard limit, but you don't want the default to be unlimited.	18:22
corvus	er, 'max_hold_age'	18:22
*** bhavikdbavishi has quit IRC		18:30
Shrews	corvus: ok. i can rework it for that	18:41
Shrews	default_hold_expiration / max_hold_expiration is probably clearest	18:47
corvus	mordred: i updated the gerrit stack, so if you have a sec to re-review the updated changes, that'd be swell	18:55
*** armstrongs has joined #zuul		19:09
*** armstrongs has quit IRC		19:15
*** kerby has joined #zuul		19:21
*** bolg has quit IRC		19:38
*** spsurya has quit IRC		19:48
*** sean-k-mooney has quit IRC		20:17
*** sean-k-mooney has joined #zuul		20:25
*** panda\|ruck is now known as panda\|ruck\|off		20:29
*** kerby has quit IRC		20:43
*** Goneri has quit IRC		20:46
openstackgerrit	Ian Wienand proposed zuul/zuul master: [dnm] testing python3 ansible https://review.opendev.org/682556	20:53
*** kerby has joined #zuul		20:56
*** hashar has quit IRC		20:57
*** pcaruana has quit IRC		20:58
corvus	zuul-maint: i pushed up https://review.opendev.org/682743 regarding http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-September/001017.html	20:59
openstackgerrit	Ian Wienand proposed zuul/zuul master: [dnm] testing python3 ansible https://review.opendev.org/682556	21:15
openstackgerrit	James E. Blair proposed zuul/zuul master: Add support for the Gerrit checks plugin https://review.opendev.org/680778	21:15
openstackgerrit	James E. Blair proposed zuul/zuul master: Update gerrit pagination test fixtures https://review.opendev.org/682114	21:15
openstackgerrit	James E. Blair proposed zuul/zuul master: Support HTTP-only Gerrit https://review.opendev.org/681936	21:15
openstackgerrit	James E. Blair proposed zuul/zuul master: Add autogenerated tag to Gerrit reviews https://review.opendev.org/682473	21:15
openstackgerrit	James E. Blair proposed zuul/zuul master: Use robot_comments in Gerrit https://review.opendev.org/682487	21:15
*** rfolco\|dentist is now known as rfolco		21:23
ianw	SpamapS / mordred: well it seems 10+ hours of debugging that python3 failure has come down to a single "b" character :)	21:37
SpamapS	ianw: of course it has	21:52
openstackgerrit	James E. Blair proposed zuul/project-config master: Add a third-party check pipeline to OpenDev https://review.opendev.org/682756	21:52
corvus	oops ^ wrong repo :)	21:59
corvus	i've pushed HEAD as 3.10.2 for the security fix	22:09
*** jamesmcarthur has quit IRC		22:13
openstackgerrit	Merged zuul/zuul master: Add enqueue reporter action https://review.opendev.org/681132	22:23
openstackgerrit	James E. Blair proposed zuul/zuul master: Move reference pipelines out of the quickstart https://review.opendev.org/682760	22:36
corvus	we should merge ^ asap -- i'm not sure if it's causing the quick-start instability, but it's certainly not helping debug it and it's not doing new users any favors	22:37
clarkb	hrm do zuul release notes only update when changes merge and not when we tag things?	22:41
clarkb	I'm noticing that 3.10.2 isn't in the release notes yet	22:41
clarkb	(still shows under in development)	22:41
clarkb	there is a change in the gate so I guess we will know soon if that merges and updates release notes	22:42
corvus	clarkb: yep. https://zuul-ci.org/docs/zuul/3.10.2/releasenotes.html is correct, but master is not	22:42
corvus	until a change lands	22:42
clarkb	aha	22:42
clarkb	thanks!	22:42
corvus	unintended consequence of promote	22:42
corvus	oh wait, we don't promote on tag	22:42
corvus	so yeah, i guess we need a "rebuild master docs" job on release	22:43
corvus	so we'd build it twice, once for the tag, and once so that master sees the new tag	22:43
corvus	but we can't use the same build for both because the tag may be behind master	22:43
corvus	i think this is a peculiarity of projects that put their release notes in their docs, which isn't most of the reno users?	22:43
clarkb	I think openstack reno usage publishes the release notes independent of the docs	22:44
corvus	i sent the release announce email	22:45
fungi	yes, openstack projects run a separate release notes job	22:46
fungi	independent of docs jobs for the same repos	22:46
corvus	i think we should be able to fix it with some override-checkout mojo	22:47
corvus	i'm too braindead to work it out myself right now :)	22:47
fungi	i don't think it's especially urgent, no	22:48
clarkb	ya the next update to master will fix it	22:48
clarkb	and that will happen shortly according to the zuu lstatus page	22:48
* mnaser just barging in with ideas		22:48
mnaser	how different/far is bwrap's security model from docker and friends	22:49
mnaser	i.e. could we technically have a k8s native zuul-executor "driver" to use pods instead of bwrap	22:49
mnaser	(coming in from a security approach and not as much of a "how much code has to be done")	22:50
clarkb	I think its quite a bit different to how people often deploy k8s but not so different than openshift's locked down pods	22:50
clarkb	for example k8s on google by default gave every pod admin access to the account in the psat (I think they have changed that since)	22:50
mnaser	oof	22:50
mnaser	i was thinking more on the container side of things, that stuff could technically be locked down via serviceaccount/rolebindings	22:51
SpamapS	mnaser: bubblewrap as zuul uses it is pretty locked down. clarkb's assessment matches my own.	22:51
clarkb	specifically it is there to limit blast radius if you manage to do something on the executor via ansible in an untrusted job	22:51
fungi	yeah, at best docker and pals would be no better security-wise (they rely on the same kernel features after all)	22:52
SpamapS	Essentially, if you can break out of bwrap, you probably have a local kernel root.	22:52
clarkb	and often times the way k8s is deployed means pods are trusted to interact with the rest of the system	22:52
clarkb	we want the opposite of that	22:52
SpamapS	hm, that's not been my experience	22:52
mnaser	right but there's a lot of stuff now to avoid exactly that (networkpolicy, securitycontext, etc)	22:52
mnaser	i was just trying to compare the container vs bwrap aspect, if in some weird/odd way you could have pods _only_ instead of zuul-executors with bwraps inside them	22:53
mnaser	then you can start scaling things out far more because you're not locking down an executors 'bwrap' to a single host	22:53
SpamapS	Unless you allow privileged: true, my experience has been that most k8s nodes are set up to be pretty safe from container escape.	22:53
mnaser	SpamapS: my only annoyance is the fact all service are exposed to everything by defaulty	22:54
mnaser	but networkpolicy can work around that easily	22:54
clarkb	SpamapS: cloud providers like gke put cloud authentication stuff in those pods though	22:54
clarkb	SpamapS: and from that you change the config to allow privileged true and win	22:54
SpamapS	mnaser: that's life, IMO. zero trust networking is the cloud model that we follow. Everything requires auth.	22:54
fungi	well, part of the problem for scaling it to hosts other than the executor is that we share files directly into the bubblewrap containers	22:54
clarkb	I think its more of a make things easy for users problem in popular deployments	22:54
corvus	mnaser: it's come up before. i think it would be possible, perhaps desirable for various non-security purposes to have a container-based executor, but it's going to take a lot of planning and implementation. we're talking a pretty big spec, and it's absolutely going to depend on the successful completion of the zuul-operator spec. in the mean time, i agree with clarkb and SpamapS that it wouldn't be a	22:55
corvus	security win. the more compelling reasons to do that have to do with, honestly, things like better integration with openshift.	22:55
clarkb	not inherent ot k8s and openshift for example locks it down	22:55
mnaser	on the subject of the zuul-operator, i am kinda under a very pressing deadline so i have been taking time to set things up and see what a golang based operator looks like	22:55
corvus	(and yeah, fungi has hit on one of the fundamental design problems to be overcome)	22:56
mnaser	and ive discovered more useful things like the ability to literally embed another operator that uses operator-sdk	22:56
clarkb	https://cloud.google.com/kubernetes-engine/docs/concepts/security-overview#securing_instance_metadata for gke docs on the subject	22:56
mnaser	so why ask the user to install the zookeeper operator when you can quite literally just include it as a dependency _inside_ your operator and it will start managing zookeeper, so 1 operator for everything	22:56
corvus	mnaser: well, the spec addresses that with https://github.com/operator-framework/operator-lifecycle-manager	22:57
mnaser	and im not talking about an extra pod, im talking about the apis and controllers living _inside_ the zuul-operator	22:57
corvus	mnaser: would embedding be a better approach?	22:57
mnaser	yeah, but this adds up a whole bunch of other things, the zuul-operator would manage Zookeeper types inside its namespace	22:57
mnaser	i.e.: to get started, simply install the zuul-operator. and that's it. you're done.	22:58
corvus	mnaser: i thought that would come out of the lifecycle manager too?	22:58
mnaser	no OLM -- or -- "if you dont want to use OLM.... make sure you have the zookeeper and the pxc and the etc"	22:58
mnaser	it sounds like OLM it actaully an extra componenet that will make it so you have 3 operators running in your namespace, where as this case, you have one	22:59
mnaser	this means you dont need anything _except_ the zuul-operator running	22:59
SpamapS	Still	22:59
mnaser	also some other fun things you can do with golang that you couldnt with the ansible one, i actually can do things like use the provided github credentials	23:00
SpamapS	that's highly optimized	23:00
mnaser	poll github	23:00
SpamapS	I respect the desire to optimize it	23:00
SpamapS	but there's a whole community to think about	23:00
mnaser	and pull down repositories it's installed into	23:00
SpamapS	98% awesome, and maintainable by ansible-knowing folks is better than 100% awesome but only 10% of Zuul users can approach it.	23:00
mnaser	so no need to necessarily list out all the repos you are using the app for (obviously some might want to explicitly list it, but it does simplify life in that manner)	23:00
mnaser	i totally understand	23:01
mnaser	anyways, i'd be happy to show what i have at some point if there's interest, i don't think i'd probably want to throw it out there to cause confusion for those seeking a zuul operator	23:02
*** tosky has quit IRC		23:11
*** jamesmcarthur has joined #zuul		23:24
openstackgerrit	Merged zuul/zuul master: Move reference pipelines out of the quickstart https://review.opendev.org/682760	23:25
*** mattw4 has quit IRC		23:34
*** igordc has quit IRC		23:39
*** jamesmcarthur has quit IRC		23:50
*** rlandy has quit IRC		23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!