Wednesday, 2017-03-22

jlk	I may have messed something up here	00:00
mordred	jlk: pip freeze won't show setuptools	00:00
mordred	jlk: pbr freeze will though	00:00
jlk	good point, I'm too used to doing _everything_ inside venvs	00:01
SpamapS	Ok I think I've got a pretty good working thing now	00:01
mordred	jlk: (also, you might like pbr freeze more than pip freeze anyway - it'll show you git shas for anything that has them recorded in their metadata)	00:01
clarkb	pip list too	00:01
mordred	which is everything that use pbr, fwiw	00:01
* mordred hasn't used pip freeze in a couple of years now		00:02
jlk	I have PTSD from pbr, so I haven't touched it	00:02
mordred	awww	00:02
SpamapS	damnit...	00:02
SpamapS	KeyError: 'getpwuid(): uid not found: 1000'	00:03
SpamapS	so close	00:03
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests https://review.openstack.org/448314	00:05
SpamapS	well I'm 5 minutes over on EOD	00:05
SpamapS	jlk: ^ that kinda works for me	00:05
SpamapS	except when I try to run as non root	00:05
SpamapS	not sure how to plumb in my user	00:05
jlk	hrm, pbr needs more than git-core it would seem	00:07
jlk	oh own-goal	00:08
*** rattboi has left #zuul		00:09
*** rattboi has joined #zuul		00:09
*** rattboi is now known as rattboi-test		00:11
*** rattboi-test is now known as rattboi		00:11
openstackgerrit	Joshua Hesketh proposed openstack-infra/nodepool feature/zuulv3: Merge branch 'master' into feature/zuulv3 https://review.openstack.org/445325	00:27
jhesketh	pabelanger: ^	00:27
*** harlowja has joined #zuul		00:42
openstackgerrit	K Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Perform pre-launch merge checks https://review.openstack.org/446275	00:44
openstackgerrit	K Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Perform pre-launch merge checks https://review.openstack.org/446275	00:48
jlk	oh interesting	01:38
jlk	SpamapS: on my system, I'm running docker as my user, and while it's "root" inside the container, it's actually my UID. It's writing things to the filesystem that show up as my UID/GID when I look at them outside the container.	01:38
jlk	BWAHAHAHA. My container got named angry_edison and I am amused.	01:41
*** harlowja has quit IRC		03:35
jeblair	jlk: watch out for vengeful_tesla	04:16
SpamapS	jlk: right, but I want it to not think it is root inside.	05:10
SpamapS	perhaps that is a bad idea	05:10
SpamapS	though I had trouble when they ran "as root"	05:17
SpamapS	590ee83c155d zuuldev "/bin/sh -c tox" 34 seconds ago Up 32 seconds dreamy_stallman	05:24
SpamapS	the hits keep on coming	05:24
SpamapS	jlk: also I think the way you're doing it, you have a VM between you and docker (docker-machine) so that's likely why the ownership stays you	05:45
SpamapS	for me, if I'm root in the container, volume touched files are root owned	05:45
SpamapS	best way to slow down zuul unit tests seems to be to run it on aufs	05:59
* SpamapS looks at how to make container rootfs == tmpfs		06:00
SpamapS	oo neeat, --tmpfs /tmp makes go fast	06:03
openstackgerrit	Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app https://review.openstack.org/448395	06:05
SpamapS	hrm.. running tests in my container fails for weird reasons	06:11
SpamapS	Ran 51 (-6) tests in 179.562s (+39.656s)	06:11
SpamapS	FAILED (id=10, failures=8 (+3))	06:11
*** isaacb has joined #zuul		06:26
*** isaacb has quit IRC		07:13
*** isaacb has joined #zuul		07:23
SpamapS	ok, docker was a huge mistake.	08:05
* SpamapS deletes it forever		08:05
* SpamapS got 6 timeouts even in single-threaded mode under docker.		08:06
*** hashar has joined #zuul		08:19
openstackgerrit	Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Remove url_pattern config parameter https://review.openstack.org/447165	11:08
openstackgerrit	Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Simplify the log url https://review.openstack.org/438028	11:08
*** hashar is now known as hasharLunch		11:08
openstackgerrit	Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Remove url_pattern config parameter https://review.openstack.org/447165	11:46
openstackgerrit	Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Simplify the log url https://review.openstack.org/438028	11:46
Shrews	jhesketh: why do you think the nodepool_id feature might need to be removed?	12:27
jhesketh	Shrews: I'm not sure it will.. I suspect it's useful, but once we stop running the old nodepool it may not be necessary	12:35
Shrews	jhesketh: the merge, as presented by gerrit, confuses me. Are we going to lose the current working code for test_leaked_node for the old broken version if we approve that?	12:40
Shrews	i really hate to see skips added back, especially for tests that are working now :(	12:42
jhesketh	Shrews: it's only tests on the v3 branch	12:42
jhesketh	so another commit to turn them back on	12:42
jhesketh	rather than fixing the test in the merge commit making it even longer	12:42
Shrews	jhesketh: i understand the skips on the new tests that v3 didn't have. it's the existing test i'm concerned about. looks to me like the merge breaks it (and then skips it)	12:43
Shrews	test_leaked_node_with_nodepool_id and test_leaked_node_not_deleted are new. that's fine to fix in a later review. but test_leaked_node works now	12:44
jhesketh	Shrews: okay, that's fair	12:44
jhesketh	something changed in the merge to break them, but you're right, they should probably be fixed in the merge commit rather than later on	12:45
jhesketh	I'll have to look tomorrow though because it's nearly midnight and this wine is nice	12:46
Shrews	jhesketh: mmm, wine. enjoy!	12:46
mordred	yah - I actually agree more that it may need to be removed - we don't need nodepool_id in the zk version because we track ownership via zk, no?	12:46
mordred	that was the hack for v2 to not delete v3 nodes	12:46
Shrews	mordred: this entirely depends on what you do with your json change :)	12:47
jhesketh	plus I just got my esp8266 reading temperature correctly so I can continue writing a logger/server to monitor what might become a cellar ;-)	12:47
Shrews	mordred: did you see the note i left on that review about breaking the world if you don't abandon it?	12:47
Shrews	mordred: my latest comment here https://review.openstack.org/#/c/297950/	12:49
mordred	Shrews: oh - yeah - let's just abandon that for v2	12:50
Shrews	hrm, i actually don't understand the purpose of test_leaked_node_not_deleted	12:54
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Make sure services are running for test-setup.sh https://review.openstack.org/448555	12:54
pabelanger	morning	12:54
Shrews	oh, the nodepool_id test	12:55
*** hasharLunch is now known as hashar		13:05
jhesketh	night all!	13:12
tobiash_	mordred: did some short testing about io footprint optimization	13:53
tobiash_	I tested with a ccached big c++ project	13:54
mordred	tobiash_: how did it go?	13:54
tobiash_	a combination of increasing commit interval of ext4 (> build duration) and a set ov vm.dirty_* settings saved me about 2.6GB of writes	13:55
mordred	nice! the vm.dirty_ settings have to be set on the host, right?	13:55
mordred	(rather than the guest)	13:56
tobiash_	mordred: I just do this at the beginning of the build job: http://paste.openstack.org/show/603765/	13:56
mordred	tobiash_: oh neat!	13:57
tobiash_	without this the dirty pages graph (which contains unwritten data) goes up and down	13:57
tobiash_	with this it more or less goes monotonic up (if there's enough ram) inhibiting most writes	13:58
tobiash_	if this proves giving good results in a wider range, these settings could just be baked into the dib image nodepool builds	13:59
*** isaacb has quit IRC		14:00
mordred	yah - I'm going to test to see if it's effects on the openstack side real quick	14:02
tobiash_	depending on what you do in the job this might need to be combined with eatmydata	14:04
mordred	tobiash_: pushed up a quick test: https://review.openstack.org/448591	14:05
tobiash_	the build time itself (did not test many parallel jobs at once) was unaffected as the writes typically are done asynchronously	14:05
mordred	tobiash_: in a few of our clouds we're our own noisy-neighbor - so even if the only effect is reducing load on the underlying cloud, it still might be a win for us	14:06
mordred	verifying that will be a little bit of work of course :)	14:06
tobiash_	yepp, that was my initial idea to reduce noisy-neighbor behaviour	14:07
*** isaacb has joined #zuul		14:11
tobiash_	one possible side effect of this is that if unwritten data is so much that it is forced to write then so much data could be written at once that dmesg logs a warning like this:	14:11
tobiash_	INFO: task jbd2/sdb1-8:612 blocked for more than 120 seconds.	14:11
tobiash_	could be happen in a situation like 15gb are unwritten, 2gb free and a process wants 8gb, then 6gb would need to be synced to disk at once	14:13
pabelanger	http://paste.openstack.org/show/603769/	14:17
pabelanger	mordred: managed to get dox working^	14:17
pabelanger	https://review.openstack.org/#/c/448555/ was the only patch to zuul needed	14:18
Shrews	pabelanger: dox??? oy	14:27
Shrews	i'm going to have to re-learn that code now, aren't i? :)	14:28
pabelanger	Shrews: :)	14:29
pabelanger	it's actually not that bad	14:29
pabelanger	obviously we need some new images	14:29
pabelanger	but, once I got my dox.yaml file setup	14:29
pabelanger	things worked as expected	14:29
*** bhavik1 has joined #zuul		15:13
*** isaacb has quit IRC		15:14
jeblair	clarkb: should we abandon https://review.openstack.org/436544 now?	15:51
clarkb	ya I can abandon it	15:53
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests https://review.openstack.org/448314	15:57
jeblair	SpamapS: i can't keep up with your on-again / off-again relationship with docker :)	15:58
SpamapS	jeblair: I know	15:58
SpamapS	it's awfu	15:58
SpamapS	l	15:59
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests https://review.openstack.org/448314	15:59
SpamapS	jeblair: that's just me dumping my local changes and context switching back into not-docker	15:59
SpamapS	The reality is.. it was kind of fun to work with	16:00
SpamapS	but it can't even run our test suite fast enough to avoid the hard timeouts.	16:00
jeblair	:(	16:00
jeblair	SpamapS: that seems strange to me based on what i think i know about containers, but i definitely don't want to fall into that rabbit hole i see you're in, so i'm just going to look away :)	16:01
SpamapS	jeblair: I'm pretty sure it's the overlay filesystem	16:01
SpamapS	system CPU usage was _very_ high	16:01
jeblair	ah	16:01
SpamapS	could dink around with btrfs or lvm	16:01
SpamapS	but...	16:02
SpamapS	at some point	16:02
jeblair	SpamapS: and the tmpfsing didn't help?	16:02
SpamapS	running the tests on a VM works fine	16:02
SpamapS	jeblair: it did a little.	16:02
pabelanger	SpamapS: jeblair: do you mind reviewing 448042? that is our first stem to green jobs again for zuulv3-dev	16:02
SpamapS	but even with 1 thread I still got 6 alarm clocks	16:03
SpamapS	pabelanger: will do that shortly	16:03
jeblair	pabelanger: we haven't landed the workspace var yet have we?	16:04
*** bhavik1 has quit IRC		16:04
pabelanger	jeblair: not yet, I restacked it on 448042	16:04
jeblair	cool	16:04
pabelanger	that stack is now green too	16:04
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Remove irrelevant test_merging_queues https://review.openstack.org/446768	16:04
*** bhavik1 has joined #zuul		16:04
*** harlowja has joined #zuul		16:06
*** bhavik1 has quit IRC		16:11
jeblair	pabelanger: +2s down the stack until a -1 at the end. i'll leave the +3 for you after SpamapS weighs in. i dunno if you want to ping clarkb on them too.	16:14
pabelanger	jeblair: sure, more people the better!	16:14
clarkb	which stack?	16:15
jeblair	clarkb: starts at https://review.openstack.org/448042	16:15
jeblair	pabelanger: can you elaborate on the socket stuff you added in 446683?	16:18
jeblair	pabelanger: oh, is it because we used to have something equivalent to wait until a host was up and serving ssh (while we tried to ssh and run ready script), so you want to get similar fine-grained failure messages?	16:19
pabelanger	jeblair: right, with SSHClient, it would raise a socket execption, however Transport doesn't.	16:20
pabelanger	So, it would results in spamming the logs: http://logs.openstack.org/83/446683/5/check/gate-dsvm-nodepool/7c4e0b6/logs/screen-nodepool.txt.gz	16:20
pabelanger	and had no easy way to trap the exception	16:20
pabelanger	however, open to suggestions on making it better	16:21
SpamapS	pabelanger: reviewed 448042.. heading down stack	16:22
* SpamapS may still be a little be decaffeinated and thus carrying a +1 cranky buff		16:24
jeblair	pabelanger: i left a -0 on 446683; can you take a look at that; and i'd like clarkb and Shrews to take a(nother) look.	16:25
pabelanger	jeblair: sure	16:25
SpamapS	ew	16:39
SpamapS	/tmp/console.log should be /run/console.log, FYI	16:39
SpamapS	(predictable filenames in /tmp are basically always a terrible idea)	16:39
SpamapS	and really /run/zuul-prepared-workspace/console.log is better (just reading roles/prepare-workspace)	16:40
jeblair	SpamapS: what's the attack vector/endgame there exactly?	16:40
jeblair	SpamapS: console.log could end up being rather langer than one might want on a tmpfs.	16:41
jeblair	(i mean, if a job wanted to replace the console log, it could do regardless of where it's stored; it will have rights to do that)	16:43
SpamapS	jeblair: it's super low risk, I know, but I prefer to viciously eliminate all use of /tmp/staticanything than try to reason about every attack vector surrounding symlinks as predictable files in /tm	16:46
SpamapS	damnit, my enter is beating my letters too often	16:47
SpamapS	With throwaway nodes, I know we don't have to worry	16:47
SpamapS	But I don't like putting bad practice into code that others will consume and possibly cargo cult without knowing.	16:47
jeblair	SpamapS: okay, but in this case, we're talking about storing potentially huge files in a tmpfs, and expanding zuul's footprint on the node.	16:48
jeblair	SpamapS: i don't accept it's bad practice. :)	16:48
SpamapS	jeblair: /tmp/${randstring} then, and randstring=$(cat /run/mydir/wheresmystring)	16:49
SpamapS	also, /var/tmp is for big files	16:49
SpamapS	/tmp is not	16:49
jeblair	SpamapS: no i mean i understand the issue	16:49
jlk	SpamapS: yeah I'm not sure what happens on OSX any more. It's no longer "docker machine", it's a native thing?	16:50
jlk	or it's a very very well hidden vm	16:50
SpamapS	jlk: oh? wild	16:50
jeblair	SpamapS: what i'm saying is that this is something we have reasoned about at length, and come to a conclusion. i don't like to blindly follow conventional wisdom. i think this is something to think about carefully.	16:50
jlk	ah	16:51
jlk	"The Docker engine is running in an Alpine Linux distribution on top of an xhyve Virtual Machine on Mac OS X"	16:51
SpamapS	jeblair: fair enough. I have not taken the time to reason about it because I've accepted that it's always a bad idea and have not been proven wrong, nor have I attempted to re-evaluate that position. I'm entirely willing to ignore this case in the face of those who have taken time to think hard about it.	16:52
SpamapS	jlk: yeah, well hidden is right!	16:52
clarkb	is xhyve a port of bhyve to os x?	16:52
SpamapS	probably	16:52
jlk	Yes	16:52
jlk	https://github.com/mist64/xhyve	16:53
SpamapS	a google search for 'predictable filename tmp' reveals a pretty awful list of CVE's though	16:53
SpamapS	so I'd have to really want to have my mind changed	16:53
* SpamapS is fully submerged in the confirmation bias now		16:53
SpamapS	crap I have a meeting in 7 minuts and 8 minutes of prep	16:53
* SpamapS de-ircs for a moment		16:54
jeblair	SpamapS: i'm happy to talk about it, or alternatives, if we can fast-forward past the part where we assume the author (o/) is not aware of the issues and hasn't thought of it. :)	16:54
SpamapS	jeblair: No assumption is being made about the author. Only questions from a stubborn old greybeard. ;)	16:54
jeblair	SpamapS: okay. happy to continue when we have some more time (it will take a bit).	16:55
Shrews	jeblair: pabelanger: i think we can go ahead and move forward with https://review.openstack.org/447108 and https://review.openstack.org/447109 today, if you two agree	17:06
pabelanger	sure	17:07
jeblair	++	17:08
Shrews	great	17:09
clarkb	ok I think I have gotten past setuptools is broken and now need caffeien and breakfast then will review pabelanger's stack	17:19
mordred	clarkb: oh good - I love it when setuptools breaks	17:19
*** hashar has quit IRC		17:29
Shrews	jeblair: rbergeron: just sent you two an email regarding doc things	17:29
Shrews	enjoy at your leisure	17:29
* Shrews decides afternoon coffee is a good idea at this point		17:30
pabelanger	2017-03-22 17:41:44,015 INFO nodepool.NodePool: Starting ProviderWorker.infracloud-vanilla	17:41
pabelanger	Shrews: ^	17:41
Shrews	yeah. and first bug found	17:42
Shrews	min-ready nodes not being started for a new provider	17:43
pabelanger	ya, noticing that	17:43
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Remove ZooKeeperConnectionConfig class https://review.openstack.org/447683	17:44
Shrews	oh, no. that's actually correct	17:45
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix hostname issue with nodepool integration job https://review.openstack.org/448239	17:45
Shrews	it's per label, not provider	17:45
Shrews	so yay	17:46
jeblair	cool, so after some use, we should see some min-readies start to pop up	17:46
Shrews	i'm going to delete a node, just to see if vanilla catches it	17:47
jlk	SpamapS: where did you mount the tmpfs?	17:47
Shrews	chocolate is greedy :(	17:49
* Shrews waits for another jlk or jeblair patch bomb		17:49
jlk	oh no	17:49
jeblair	Shrews: oh, i do have a stack that needs revising...	17:50
Shrews	\o/	17:50
jlk	jeblair: You had mentioned something about using tmpfs to speed up tox, where did you make hte tmpfs? in .tox/ ?	17:53
jeblair	export ZUUL_TEST_ROOT=/tmpfs	17:53
jeblair	jlk: i do that ^	17:53
jeblair	jlk: so you can mount one anywhere, then tell zuul unit tests to use it that way	17:54
jlk	I see, and that tells, tox to dump stuff there?	17:54
jeblair	jlk: it's internal to zuul's tests. so whenever zuul creates a tmpdir (like ALL THE TIME) it makes one there	17:55
jlk	I see	17:55
jeblair	jlk: i, er, probably could have just used TMPDIR env variable, but i don't think i realized python tmpdir respected that at the time.	17:55
jeblair	it's old.	17:55
jeblair	for that matter, i reckon that would probably just transparently work too; i haven't tried it. :)	17:56
clarkb	Shrews: in v2 allocations were proportional to total provider quota. But I think now its whoever can respond to a request first ya?	18:02
clarkb	at least at zero usage	18:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add 'allow-secrets' pipeline attribute https://review.openstack.org/447138	18:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Isolate encryption-related methods https://review.openstack.org/447087	18:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Augment references of pkcs1 with oaep https://review.openstack.org/447088	18:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add support for job allowed-projects https://review.openstack.org/447134	18:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Serve public keys through webapp https://review.openstack.org/446756	18:02
jlk	hrm, 4 minutes in docker to run pep8	18:03
Shrews	clarkb: yeah. though each request is for a single node, so that gives each provider at least an opportunity to satisfy it	18:03
pabelanger	that did something for infracloud-vanilla	18:03
Shrews	cool, i see vanilla nodes	18:03
jeblair	and providers should respond more slowly as they get busier; if that's not enough, we can borrow a gearman trick from zuul v2.5 and start adding proportional sleeps to the algorithm.	18:04
jlk	LOL vs 32 seconds on the VM. WTF.	18:04
clarkb	I approved the bottom change of pabelanger's stack (please let me know now if there are still more reviewers interested in it but looks like it got a lot of review)	18:05
pabelanger	clarkb: great. I think you are the last reviewer atm. So, you should be safe to go up to 441617	18:06
clarkb	thats what I thought, thanks	18:06
pabelanger	Shrews: but, if we had a nodepool-launcher, per provider, each would have 2 min-ready nodes, right?	18:07
pabelanger	or still 2 across launchers	18:07
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add 'allow-secrets' pipeline attribute https://review.openstack.org/447138	18:07
Shrews	pabelanger: no. it would still see 2 ready 'label' nodes	18:08
pabelanger	k	18:08
Shrews	pabelanger: assuming the labels in each config were identically named, that is	18:09
pabelanger	right	18:09
clarkb	pabelanger: et al one thing I notice reading http://zuulv3-dev.openstack.org/logs/1ce3b8446a594d8c8f07092786aba219/console.log to review that change is we don't annotate the console log with what script is being run	18:11
clarkb	not sure if we want ot have the logger handle that globally or just modify our scripts to echo some info about themselve as a form of header or what, but it would make following console log flow easier I think	18:11
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Create extra-test-setup role https://review.openstack.org/448042	18:13
pabelanger	clarkb: Ya, so that is an interesting problem now. Because there isn't any stdout when ansible runs the script, which means zuul_stream cannot append it to console.log. I'm not sure how best to fix that	18:14
jeblair	mordred: ^	18:14
pabelanger	and agree, it will be confusing for people that just look at console.log	18:14
jeblair	i know that if we streamed the log, we would see what was actually being run	18:14
jeblair	the solution might be to copy something produced by the callback plugin rather than the console log on the host?	18:15
clarkb	another log observation is we don't seem to capture where the job ran?	18:16
pabelanger	ya, we need to add that still	18:16
pabelanger	on my list	18:16
clarkb	in one way thats nice this was an "ubuntu-xenial" host and details are abstracted. On the other the two py27 jobs took vastly different amounts of time to run and wondering if thats related to cloud /region or the changes themselves etc	18:17
clarkb	pabelanger: for that I think a host info role like our net info macro in jjb probably makes esnse	18:17
clarkb	just echo the hostname and some basic host networkign information	18:17
pabelanger	++	18:18
pabelanger	going to do something like: http://logs.openstack.org/17/441617/10/check/gate-zuul-pep8-ubuntu-xenial/621bbeb/_zuul_ansible/pre_playbook	18:18
pabelanger	and use zuul_stream	18:18
pabelanger	I'll do that now	18:18
clarkb	what is zuul_stream?	18:20
pabelanger	our process that runs on the worker to add things into console.log	18:20
pabelanger	mordred: created it as an ansible task	18:21
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable https://review.openstack.org/441441	18:21
clarkb	I remember reviewing those changes, but there were many and the beginning state was completely different than the end state so I'll admit to not really having it all sorted out	18:21
pabelanger	actually, I think it changed a little from zuulv2.5	18:21
clarkb	ya its different	18:21
pabelanger	so maybe just an echo like you said	18:22
Shrews	n-l handled that workload well with the additional provider. chalking that up as a success.	18:22
clarkb	iirc we fork a process on the nodepool node and every playbook that runs there writes to a socket?	18:22
clarkb	and the forked process is on the other end of that socket reading the data as each playbook runs and it writes it to console log? so I think you just have to echo ya	18:22
clarkb	rather than use a special annotation	18:23
clarkb	pabelanger: or does it run on the zuul launcher itself? I think its the nodepool node	18:23
jeblair	zuul_stream is the callback plugin which runs on the executor.	18:24
jeblair	when i suggested that instead of saving the console log, we should save the output of 'the callback plugin' that's what i was referring to	18:24
clarkb	jeblair: so there is a forked process for every job on the executor with an open socket reading the writes from the job itself?	18:24
mordred	jeblair: reading	18:25
clarkb	the callback plugin is running in the context of ansible execution on the executor, but wondering where the forked process that reads from that is	18:25
clarkb	(thats the bit I am currently confused about, though its not super important here)	18:26
jeblair	clarkb: the callback plugin forks a process to read the stdout over TCP from each node	18:26
clarkb	gotcha thanks	18:27
jeblair	(so we don't fork on every ansible 'command' execution -- just on the first ansible 'command' execution for a given node)	18:27
clarkb	ya	18:27
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs https://review.openstack.org/441467	18:27
pabelanger	^ was the task I meant that will not show up in console.log	18:27
mordred	jeblair: yes - I believe that is actually the right way forward - I have not done that yet because I think we'll wind up doing that as a matter of course when we plumb streaming all the way through	18:28
mordred	jeblair: that is - the thing we wind up writing to console.log should be the same thing as what we do from our streaming	18:28
clarkb	pabelanger: https://review.openstack.org/#/c/441617/10/playbooks/roles/prepare-workspace/tasks/main.yaml is roughyl where I would collect the data	18:29
clarkb	pabelanger: have a dump of it all in one place, hostname, networking, etc	18:29
jeblair	mordred: where does the output of zuul_stream go right now?	18:29
clarkb	(but I agree that a host-info role is appropriate rather than part of workspace info)	18:30
pabelanger	++	18:30
mordred	jeblair: hrm - actually, that should be what's writing console.log	18:30
mordred	jeblair: so, ignore me - this should already be the case	18:31
jeblair	mordred: zuul_log writes /tmp/console.log on the worker node	18:32
mordred	http://zuulv3-dev.openstack.org/logs/1ce3b8446a594d8c8f07092786aba219/ansible_log.txt	18:32
jeblair	mordred: zuul_stream runs on the executor and reads /tmp/console.log and interpolates it with normal ansible output	18:32
mordred	yes. normal ansible output goes to ansible_log.txt on the executor	18:32
jeblair	mordred: okay, so zuul_stream should be going to that yeat	18:32
jeblair	mordred: i don't see the console output there?	18:32
pabelanger	so, in the case of: https://review.openstack.org/#/c/441467/20/playbooks/roles/revoke-sudo/tasks/main.yaml I think we should have added a shell: echo "Remove sudo access for zuul user.", so console.log seen it	18:33
mordred	2017-03-21 20:49:31,202 p=6495 u=zuul \| [WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin	18:33
mordred	(<ansible.plugins.callback.zuul_stream.CallbackModule object at	18:33
mordred	0x7f1b8d8f0550>): all not found in hostvars	18:33
mordred	well- there's at least one issue in there-although I think jamielennox has a patch up to deal with that	18:33
pabelanger	ya, we haven't restarted zuul yet	18:33
jeblair	mordred: yeah, that was I9274a2098348b736198e5fea344f078ee0404b41 which merged	18:34
mordred	cool	18:34
mordred	there is likely a bug then - will start staring at code	18:34
jeblair	cool	18:34
mordred	it SHOULD be in there - that said, that ansible_log.txt file is ugly - so we may also want to do additional things	18:35
jeblair	mordred, clarkb, pabelanger: so aiui, we should stop copying /tmp/console.log in our post-playbook, and instead, copy the ansible_log.txt file as its replacement.	18:35
jeblair	mordred: yeah, then i think we should look at maybe doing something sane with the rsync output. that's the biggest unreadable mess.	18:36
mordred	yes	18:36
mordred	to both things	18:36
jeblair	SpamapS: that actually removes one of the main drivers for having /tmp/console.log be a known location. after we do that, i think we can feel free to make it a regular anonymous tmpfile. :)	18:36
mordred	we actually should be able to just stop copying anything in the post-playbook - ansible writes it directly into the workspace already	18:36
SpamapS	jeblair: neat. :-D	18:37
jeblair	mordred: er yeah, that's a good point. i mean, we did just link directly to the file. :)	18:37
mordred	jeblair: so - 2 things to sort - a) make that rsync output less ugly	18:37
SpamapS	jeblair: I've since had the requisite two cups of coffee, so my beard is a bit less grey and I'm far less cranky. :)	18:37
mordred	jeblair: b) figure out why the console output isn't showing up in there in the first place	18:37
mordred	SpamapS: you only require 2 ?	18:37
jeblair	SpamapS: storing coffee in your beard for later? :)	18:38
clarkb	re rsync maybe just summarize "x bytes transfered into y files" ?	18:38
mordred	jeblair: I think large pile of # 127.0.0.1:22 SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.1 lines are super helpful too :)	18:38
pabelanger	Oh, I figure that out	18:38
pabelanger	it was ssh-keyscan	18:38
jeblair	mordred: oh that's actual stdout from the test	18:38
mordred	jeblair: yay!	18:38
jeblair	mordred: so -- yes, we should get rid of it because it's annoying stuff in our unit tests, but from a zuul arch point of view, that is correct output from the job which should be included. :)	18:39
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Organize playbooks folder https://review.openstack.org/441547	18:39
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Create tox-tarball job https://review.openstack.org/441609	18:39
mordred	we could just plop a nolog onto our rsync of the git repos	18:39
jeblair	mordred: and as pabelanger says, it's in the process of being removed.	18:39
jeblair	mordred: cool, that sounds nice and easy	18:40
mordred	yah	18:40
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Silence command warnings https://review.openstack.org/448748	18:40
mordred	speaking of ^^ that's a meaningless change but just happened to notice when I was looking at something else - I'm also happy if people don't think it's a good idea	18:41
jeblair	oh i wish i knew that for v2.5 :)	18:41
clarkb	is the all not found in hostvars the thing you were saying was fixed?	18:41
clarkb	and if so, next question is why did that not make the job fail?	18:42
jeblair	clarkb: i think it's just in the callback plugin which isn't critical?	18:42
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop logging git repo rsync output https://review.openstack.org/448750	18:42
mordred	yes. the callback plugin is "just for logging"	18:42
clarkb	I'm conflicted by that statement :)	18:43
clarkb	if logging doesn't work then I have no way of knowing a success was legit	18:43
clarkb	I think logging not working should be a failure?	18:43
mordred	when we get zuul to start interpreting the output of running stuff, we should likely figure out a way to trap for that and fail hard	18:43
mordred	clarkb: yes. I agree	18:43
mordred	that's why I put it in scare-quotes	18:43
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop copying console.log https://review.openstack.org/448752	18:45
jeblair	That's just a simple remove but i put in a todo about the tmpfile thing	18:45
SpamapS	mordred: yes, 2 cups == human sauce ... more and I become a _pleasant_ human	18:46
* SpamapS afk's		18:46
jeblair	but i'm going to WIP that because we shouldn't land it until mordred fixes the callback plugin	18:46
mordred	before we investigate TOO much further, it might be nice to restart with jamielennox's change applied	18:47
jeblair	oh! restart!	18:47
jeblair	i forgot about that	18:47
jeblair	we so rarely have to now :)	18:47
mordred	the original testing was that stuff got written to ansible_log.txt	18:47
mordred	jeblair: I know! it's exciting	18:47
pabelanger	okay, a little confused about 448752	18:49
pabelanger	so, what is the console.log used for?	18:49
pabelanger	just live streaming I guess	18:49
jlk	SpamapS: alright I think I'm going to give up on Docker Toxxer too, at least on OSX.	18:51
Shrews	jlk: you should give up on OSX	18:52
Shrews	i just can't use it for real development anymore	18:52
jlk	heh, I really don't feel like going through the pain of re-imaging this laptop	18:53
jeblair	mordred: restarted	18:53
Shrews	jlk: buy a new laptop! :)	18:53
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop copying console.log https://review.openstack.org/448752	18:53
jlk	Shrews: you mean make IBM buy me a new laptop	18:53
Shrews	yes. that.	18:53
jlk	I haven't actually owned my own laptop since... 2002~	18:54
mordred	pabelanger: yes - console.log on the remote host is where things write content so that the log stremer can stream it	18:55
mordred	pabelanger: we do things in the command and shell modules to cause the stdout/stderr to go there instead of being returned in the ansible return structure	18:56
mordred	pabelanger: then in the zuul_stream callback plugin on the executor we open a socket connection to the daemon on the remote node and read the stdout/stderr from it and inject it into the output	18:56
mordred	pabelanger: it's a bit of a strange dance - it's probably worth a diagram at some point	18:57
pabelanger	do we think people will no be confused between the 2 outputs? stream and archived logging?	18:57
Shrews	that screams "diagram please"	18:57
clarkb	jlk: double check you have vmx?	18:57
clarkb	if speed is the only problem, looks like bhyve doesn't actually require vmx for single cpu VMs so you might be getting slow emulation	18:58
jlk	clarkb: it's a known issue with OSX and mounted host volumes	18:58
jlk	it's just super slow. There are some hacky go-arounds, like using unison to sync files into the container rather than do a volume mount. That made it much faster, but tests that should pass are just failing	18:58
jeblair	pabelanger: let's see if we can get a good example and see if it's confusing.	18:58
*** harlowja has quit IRC		18:59
pabelanger	agree! I think diagram will help too	18:59
jlk	I suppose it could be me testing on Fedora rather than Ubuntu	18:59
jeblair	pabelanger: well, i diagram will help us understand how it works. a diagram must not be necessary to help a user understand the output.	18:59
*** harlowja has joined #zuul		19:00
*** harlowja has quit IRC		19:01
jeblair	pabelanger, mordred: http://zuulv3-dev.openstack.org/logs/07206b6a6d514c40b4852254e2993f8a/ansible_log.txt is much better	19:08
jeblair	though it seems to be missing the tox output?	19:09
Shrews	jeblair: i wonder if we should add retry logic to our delete* zk api methods? that kazoo recursive delete() issue keeps popping up: http://logs.openstack.org/95/448395/1/check/nodepool-coverage-ubuntu-xenial/86a50ab/console.html#_2017-03-22_06_07_58_052595	19:11
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617	19:11
jeblair	Shrews: the shared locks would let us fix that, right?	19:12
Shrews	jeblair: yeah, it would	19:12
Shrews	assuming it ever gets merged :)	19:13
jeblair	Shrews: maybe as a temporary measure, so we can land changes? :)	19:13
jeblair	Shrews: oh, it'll get merged somewhere	19:13
Shrews	jeblair: let's hold off a bit then. it doesn't bite us too terribly often. if it gets worse, then yeah	19:13
jeblair	pabelanger, mordred: it looks like maybe we only got the console log for the first task?	19:14
jeblair	pabelanger, mordred: oh! it's because we have multiple playbooks	19:17
jeblair	pabelanger, mordred: the zuul_stream log reading subprocess terminates at the end of a playbook, but the daemon_stamp file still exists, so the next zuul_stream callback (for the next playbook) does not launch a new subprocess.	19:17
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617	19:18
jeblair	we either need to make the stamp file specific to a playbook (where does it get written anyway?), or clean it up properly.	19:18
mordred	jeblair: OH!	19:19
clarkb	could we avoid the stamp file entirely and have ansible parent process kill the child using atexit?	19:19
jeblair	clarkb: the stamp is so that two tasks within the same playbook don't each start threads	19:20
jeblair	er, processes	19:20
jeblair	(heh, if it were threads this would be easy)	19:20
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617	19:21
clarkb	right instead of using a file though ust do process management?	19:21
clarkb	eg "do I have a child log worker process if no then create one" then later when process dies kill its child so it doesn't zombie	19:21
jeblair	clarkb: how do we prevent a second process from running for a second task?	19:21
pabelanger	readying backscroll	19:22
pabelanger	reading*	19:22
jeblair	clarkb: oh i think i see what you're saying -- keep track subprocesses it in memory in the callback. i'm going to turn this over to mordred because i'm fuzzy on the ansible internal details here. :)	19:23
clarkb	jeblair: ya either that or query the operating system for members of the process group and filter	19:23
*** bhavik1 has joined #zuul		19:24
jeblair	clarkb, mordred: i will say that part of the reason we don't want to open a new tcp connection for each task is that we get the whole stream again each time. so actually, we probably don't want to just naively make the callback plugin do another subprocess for a second playbook -- we'll get a copy of everything that has come before.	19:26
* mordred is trying to remember if there was a reason we didn't do it that way originally		19:26
clarkb	jeblair: oh interesting	19:26
mordred	yah	19:26
clarkb	jeblair: even though they are bound by different invocations of ansible-playbook?	19:26
mordred	it's because on the remote host we don't have any idea that the first ansible-playbook stopped	19:27
jeblair	mordred: maybe we thought that this would span playbooks and that's why we made a stamp file, but the error is that we weren't expecting the subprocess to die at the end of the playbook?	19:27
mordred	yes. I think we made that logic error	19:27
jeblair	(i'm assuming it does die, but i'm only inferring that from log output)	19:27
mordred	which is pretty obvious now in hindsight	19:28
mordred	I mean - we launch a new subprocess for each playbook	19:28
jeblair	mordred: yes, though we set p.daemon, and without looking that up in the manual -- i might make assumptions as to what it does. :)	19:28
mordred	yah.	19:28
jeblair	to be clear, i'm going to modify and repeat an earlier thing i typed:	19:29
jeblair	mordred: maybe we thought that this would span playbooks and that's why we made a stamp file, but the error is that we weren't expecting the log streaming subprocess to die at the end of the playbook?	19:29
jeblair	(just in case that was ambiguous as to which 'subprocess' i meant there)	19:29
*** bhavik1 has quit IRC		19:29
jeblair	"When a process exits, it attempts to terminate all of its daemonic child processes."	19:30
mordred	jeblair: yes - I think that may be the case	19:30
jeblair	so, yeah, according to docs, it seems it's expected for our ("daemon") log-streaming subprocess to die	19:30
jeblair	(this is via the multiprocessing module)	19:31
jeblair	i have to forage for food now	19:31
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617	19:32
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617	19:33
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617	19:35
pabelanger	k, sorry for noise	19:35
pabelanger	the only piece we might want, is info about which zuul-executor was used	19:35
pabelanger	but, we don't log that currently	19:36
*** hashar has joined #zuul		19:40
mordred	jeblair: ugh. so it's really caused by the fact that we used multiprocessing - which we used because trying to get a multiprocessing subprocess from ansible to spawn a daemon process the way you did in the log streamer originally died in a fire	19:40
mordred	jeblair: ping me when you get back from food	19:41
*** harlowja has joined #zuul		19:45
jlk	pabelanger: when using dox, did you just pip install it, or install from git?	19:56
pabelanger	jlk: git	19:59
pabelanger	I also had to rebuild the image from docker	19:59
pabelanger	I was going to push up some reviews tonight	19:59
jlk	okay. I wanted this to work on Fedora, but it might not. :(	20:00
pabelanger	I ran dox from fedora-25	20:00
jlk	I don't know if that's th eissue.	20:00
pabelanger	but used xenial images	20:00
jlk	I meant the image _in_ docker	20:00
pabelanger	containers*	20:00
pabelanger	ya, I haven't tried fedora yet	20:00
jlk	like I wanted tox to pass on fedora-d25	20:00
jlk	I don't know if that's my problem locally.	20:00
pabelanger	Ya, I've been testing on fedora-25 too locally, tox does work	20:02
pabelanger	I'll try fedora-25 dox later tonight too	20:03
jlk	okay, well then it's probably just that docker on osx is too unstable for the py27 tox target. pep8 worked, and I got it reasonably fast, but py27 falls over for unknown reasons	20:07
mordred	jlk: you saw that SpamapS had timeouts with his docker too - his hunch was aufs	20:12
jlk	hrm	20:14
jlk	COFFEEEEEEE	20:15
mordred	yah man	20:15
mordred	I'm doing that right now	20:15
*** harlowja has quit IRC		20:18
pabelanger	need some coffee too... or I should just nap	20:19
*** hashar has quit IRC		20:23
*** hashar has joined #zuul		20:23
jlk	okay, maybe I can think clearly now.	20:34
mordred	jlk: just to be sure - have another mug	20:39
jeblair	mordred: i am sufficiently burritoed.	20:42
mordred	jeblair: woot!	20:42
mordred	jeblair: so - I have a call with cdub in 18 minutes	20:42
mordred	jeblair: but - I have been thinking about our issue ... and have 2 ideas	20:44
mordred	jeblair: one is to make the streaming interaction with the worker more richer - like, rather than telnet/netcat, have it be more complex, maybe with playbook id that can get passed in so that playbooks can request to start at a point in time or something	20:45
mordred	jeblair: which seems like a lot of work and probably more and more fragile - but ultimately is still doable	20:45
mordred	jeblair: but the thing I like, which is even more hacky but likely to maybe/probably be simpler/more reslient	20:46
mordred	is to spin up a "logging" thread/subprocess on the zuul-side which runs a playbook that basically does the zuul_log module on all of the hosts in the inventory, then just starts streaming the results back into the ansible_log.txt file like the daemon process does now	20:47
mordred	and have zuul kill that subprocess when it's done executing the job	20:48
jeblair	mordred: re the first thing -- that would mean having the zuul_log plugin thingy annotate the console.log with metadata, yeah? or maybe having it write out different streaming output files for each playbook (then obviously being able to serve each of them)?	20:48
mordred	jeblair: yes - one of those two things	20:48
jeblair	mordred: i was thinking of the problem and had two thoughts as well --	20:48
mordred	neat!	20:48
jeblair	mordred: one was that part of the "start from beginning" thing was for humans, and we don't need that anymore on the worker node. so we can maybe think about dropping that. but we still have synchronization issues (like the worker starts recording data before the callback starts fetching it), so maybe we still need it.	20:49
jeblair	mordred: the second was almost exactly what your second idea was. :)	20:49
jeblair	mordred: so that's two votes for that. :)	20:50
mordred	jeblair: woot! maybe we should explore that one for now then	20:50
jeblair	mordred: sounds like a plan	20:51
mordred	woot	20:51
mordred	I'll think about it in earnest after my next call	20:51
*** harlowja has joined #zuul		21:33
jlk	Anybody seen anything like "Exception: Job project-gerrit in common-config is not permitted to shadow job project-gerrit in common-config" ?	21:49
SpamapS	not I	21:50
jlk	okay cool, so this is definitely broken	21:53
jeblair	jlk: no; zuul emits that when two jobs with the same name appear in two different repos. so it suggests zuul thinks that the first "common-config" is different than the second "common-config" repo in some way.	21:57
jlk	that's... bizarre	21:57
jeblair	jlk: (maybe it appeared twice in the repo listing in the tenant? i don't think we have any sanity checking around that yet)	21:58
jlk	http://paste.openstack.org/show/603828/ is the zuul.yaml	21:59
jhesketh	Morning	21:59
jlk	oh so	22:00
jlk	maybe	22:00
jlk	http://paste.openstack.org/show/603829/	22:00
jlk	that's the tenant config	22:00
jeblair	yep	22:00
SpamapS	oh man	22:01
SpamapS	I broke subunit	22:01
SpamapS	Length too long: 18335405	22:01
jlk	jeblair: I took a guess on the tenant config, what's the right way to list multiple sources?	22:01
jlk	can a tenant have more than one connection?	22:02
jeblair	jlk: i think in production that should work, but the test framework probably won't deal with that correctly -- they'll need different git repo names. also, in general, we're going to have a devil of a time with identical repo names until we implement http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html	22:04
jeblair	jlk: so yes, can have multiple connections, will just (temporarily) need distinct git repo names across them	22:04
jlk	I can see the 'repos to test" needing to be unique, but I think where it's falling over is that there needs to be different repos to grab configuration from	22:05
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:05
jeblair	jlk: i agree, while it's possible/likely that the test framework will not do the correct thing with the underlying git repos if they have the same name, that probably isn't the actual issue you are hitting here (unless you have distinct content in those two identically named repos -- then zuul would probably read the same content twice because of that error)	22:07
jeblair	jlk: the actual error is that the model.Project object associated with each of those is the same, despite being from two different connections	22:07
jlk	they aren't identically named repos, they're literally the same repo. I was approaching this from a "centrally configured service that works with multiple project locations"	22:08
jlk	so one configuration repo, handling projects that live in more than one connection location	22:08
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:08
jeblair	jlk: but one of them is accessed via gerrit, and one is acccessed via github...	22:08
jlk	the config repo? It's accessed by git.	22:09
jlk	unless I'm really misunderstanding something	22:09
jlk	I shouldn't have to host my zuul configuration in somebody's gerrit in order to connect to that gerrit	22:09
jeblair	jlk: that file tells zuul where to find all the repos it works with. it's a list of each connection, and for each connection, a list of repos accessed via that connection.	22:10
jlk	hrm.	22:10
jeblair	jlk: so that config says "work with the common-config repo on the github server" and "work with the common-config repo on the gerrit server"	22:11
jlk	My brain hasn't caught up with the whole "config in git" world.	22:11
jlk	I'm trying to replicate where all the config lives on the local filesystem	22:11
jeblair	jlk: the "git" driver will do that for you	22:11
jlk	like, I want _one_ place to put my zuul configuration, even if that configuration is used by multiple connections	22:11
jeblair	jlk: i think there's still a mismatch	22:11
jeblair	jlk: the zuul configuration isn't used by connections -- it's global. it loads bits of the config from every git repo it knows about.	22:12
jeblair	jlk: regardless of which connection it uses to access each of those repos	22:12
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:13
jeblair	jlk: so the part of the config you load from github can be used by projects in gerrit, and vice versa	22:13
jeblair	(it all goes in to one pot)	22:13
jlk	okay, okay...	22:13
jlk	so I should be able ot list ONE config source, and it may list multiple pipelines, which use multiple drivers	22:14
jlk	and I could use the "git" source in tenant config to deliver it by raw git, and not bother with trying to go through "github"	22:14
jeblair	jlk: the second thing yes. the first thing i'm still having trouble with. :)	22:15
jeblair	jlk: mostly because zuul won't know about any projects that aren't listed in the tenant config.	22:16
jlk	er...	22:17
jeblair	jlk: http://paste.openstack.org/show/603830/ is valid	22:17
jlk	but we don't list projects in the tenant config	22:17
jeblair	jlk: if that's what you're getting at	22:17
jeblair	jlk: (then foo and bar can both be enqueued into a pipeline defined in project-config)	22:17
jlk	I should re-state.	22:17
jlk	_I_ haven't been listing project repos in the tenant config, I"ve only been listing them in the zuul.yaml file within the config repo	22:18
jlk	I may have been doing this all wrong!	22:19
jeblair	jlk: true, we list 'repos' in the tenant config, and 'projects' in zuul.yaml. however, it turns out that they need to refer to the same objects in memory anyway, and it's a little confusing, so in that email i'm proposing we start using the word 'project' in the tenant config as well.	22:19
jlk	so what I've had that seems to be passing tests, is	22:19
SpamapS	jeblair: when you say "replace NullChange with enqueueing Refs" .. just so I know we're on the same page.. what you meant was to enqueue a Ref that represents HEAD of the given project yes?	22:20
jeblair	jlk: really every project should show up two places: once in the tenant config ("main.yaml") to tell zuul to go fetch it and read its .zuul.yaml file, and at least once in a zuul.yaml or .zuul.yaml file within one of those projects. [we will probably need to make exceptions for foreign projects in the third-party ci case, but let's ignore that for this coversation]	22:20
jlk	http://paste.openstack.org/show/603831/	22:20
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:21
jeblair	SpamapS: yes, or possibly the tip of a given branch. i think that will make periodic jobs make much more sense (zuul says "test this ref. don't worry about where it's from")	22:21
SpamapS	jeblair: great. Just checking before I fall down the "how do I ask Gerrit for that" hole	22:21
jeblair	SpamapS: might be able to ask a zuul merger too (that might make it slightly more driver independent)	22:22
jeblair	jlk: you should list org/one-job-project in main.yaml as well; at the very least, it won't be able to have a .zuul.yaml file if it's not listed there.	22:23
jeblair	pabelanger: i've flipped to a -1 on https://review.openstack.org/447647 can you see my comment, please?	22:24
jlk	right, we aren't... testing that path yet.	22:24
jlk	it just seemed to "work" as far as unit tests are concerned.	22:24
jeblair	pabelanger: i think we need a working gate-zuul-nodepool job to see the failure i mentioned.	22:24
SpamapS	jeblair: Ah, ok. Hadn't thought of that. Currently changing zuul/gerrit/source.py to stop spewing NullChanges though, so right now I think it's ok.	22:24
jlk	because the entirety of the config was in the common- repo	22:24
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:24
SpamapS	our zuul/source/gerrit.y maybe?	22:24
pabelanger	jeblair: sure. infact, it should be fixed now, I can check experimental	22:25
jeblair	jlk: yeah, i can see how that would work.	22:26
jlk	AHAHAHAHAHA	22:26
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:26
jlk	oooops	22:26
jlk	jeblair: so yeah, I just tried to list the project repo in main.yaml	22:26
jeblair	jlk: we may even want to keep that working for the third-party-ci foreign-project case. but generally speaking, we'd want to list all the projects there so that they can have .zuul.yaml files.	22:26
jlk	and ran into a lovely: File "zuul/driver/github/githubsource.py", line 81, in getProjectBranches raise NotImplementedError()"	22:27
jeblair	jlk: that sounds useful!	22:27
jlk	looks like I clearly haven't added the capability to fetch zuul files from project repos to the github driver yet.	22:27
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:27
jeblair	jlk: that's a new thing in v3. it's "list all the branches of a project, so i can come right back and ask for a .zuul.yaml file on every one of them".	22:27
jlk	okay well, I'll go further down the path of not listing the projects there yet	22:28
jlk	because I removed it, and got the config to load (and the test fails later down the road, but that's a good start)	22:28
jeblair	cool	22:28
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617	22:29
jlk	jeblair: thank you for the guidance!	22:29
pabelanger	Okay, neat!	22:30
jeblair	jlk: np!	22:30
pabelanger	I figured out the hostname of zuulv3-dev from ansible	22:30
pabelanger	however...	22:30
pabelanger	I think it is a security issue	22:30
pabelanger	https://review.openstack.org/#/c/441617/	22:30
pabelanger	pretty sure we don't want to allow that ^	22:30
jlk	a lookup outside of the safe path?	22:31
pabelanger	specifically the lookup for executor	22:31
pabelanger	ya	22:31
jlk	isn't this ran trusted?	22:31
jlk	or is this the untrusted bits?	22:31
pabelanger	untrusted	22:31
jlk	yeah, okay.	22:31
pabelanger	basically, I think we need to disable lookups for untrusted stuff	22:31
pabelanger	which I think we can do in ansible.cfg?	22:32
jeblair	mordred: ^ fyi	22:32
jlk	file lookups for sure	22:32
jlk	and probably template too	22:32
pabelanger	ya	22:32
jlk	since that is also doing the same thing?	22:32
pabelanger	I didn't want to dig more into it	22:32
pabelanger	Ihttps://docs.ansible.com/ansible/intro_configuration.html#lookup-plugins	22:33
pabelanger	we should be able to set that to None for untrusted	22:33
jeblair	if lookup is useful, we can also make a sanitized version of it like the other plugins	22:34
jlk	oh dear there are a lot of lookups	22:34
mordred	oh wow.	22:34
pabelanger	They could be, but lookups are limited to side ansible-playbook runs	22:34
pabelanger	so, would we want jobs looking up things on executors?	22:34
jlk	well...	22:34
jlk	you could lookup passwords from a password store	22:35
jlk	or DNS entries, or...	22:35
pabelanger	right	22:35
jlk	oh god, redis.	22:35
mordred	yeah - this is a whole new layer of fun	22:35
jlk	hahaha	22:35
jlk	there's a pipe lookup	22:35
jlk	NOTHING CAN GO WRONG THERE	22:35
jlk	like that's just straight "run my code on your box please"	22:35
pabelanger	I did try to gather facts on localhost, which was blocked. So that is good	22:35
mordred	pabelanger: woot!	22:36
jlk	and env lookups. We don't have anything important in the env, do we?	22:36
mordred	maybe we start with disabling lookups while we go through the list of lookups to sanitize them	22:36
pabelanger	just the defaults setup by bash for zuul user	22:36
jlk	yeah, shut 'em down	22:37
jlk	we'll turn any on that we absolutely need	22:37
pabelanger	mordred: ++ good place to start	22:37
pabelanger	I mean, container things should help with this too right?	22:37
jlk	eh...	22:37
jlk	if we're going belt+suspenders	22:37
mordred	yah - we should belt/suspenders it in both places	22:37
jlk	trying to prevent local code execution	22:38
pabelanger	okay, I'm good with disabling of look ups :)	22:38
jeblair	++ belt and suspenders	22:38
pabelanger	so, I'd like to know the hostname of the executor, so I can log it.	22:38
jeblair	pabelanger: so once we have containers, when we run into something like this we should say "oops -- look, zuul let me do this dangerous ansible that the container caught; let's go fix zuul so that doesn't happen" :)	22:39
jlk	Do we have a list somewhere of all the plugins we're rejecting?	22:39
pabelanger	aside from modifying zuul to add it to vars.yaml, any other suggestions?	22:39
jlk	or are we rejecting _all_ custom plugins?	22:39
pabelanger	jeblair: agree	22:39
jeblair	pabelanger: add it to zuul.exeuctor in vars.yaml	22:39
pabelanger	k	22:39
jeblair	jlk: dirs and files in zuul/ansible/* is more or less the list	22:40
jlk	is that a white list or a blacklist?	22:41
jeblair	jlk: untrusted mode forces those to be the only callback + action (and soon + lookup) plugins available to ansible, and refuses to run if a playbook or role has a plugins dir)	22:41
jlk	okay so we refuse any plugins/ dir	22:42
jlk	that's good	22:42
jeblair	jlk: yep, and override the built-in ones	22:42
SpamapS	oh yay, I fixed enough stuff that subunit works again	22:42
SpamapS	FAILED (id=16, failures=26 (-8))	22:42
jlk	we should add connections to that	22:43
mordred	jlk: whatchamean?	22:43
jlk	We should narrow down what connection plugins are allowed	22:44
mordred	++	22:44
jlk	so that user couldn't influence a host in the inventory to have new facts, which would include ansible_connection	22:44
jlk	do we have something trying to stop users from changing ansible_host ?	22:45
jlk	either by set_fact or add_host ?	22:46
mordred	jlk: we do not - but we do prevent them from connecting to localhost	22:48
jlk	in what way?	22:48
jlk	I think I recall something checking if the connection is "local" or localhost	22:49
jlk	or 127.0.0.1	22:49
jeblair	pabelanger, Shrews, rbergeron: i have started on the nodepool config structure update described at http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html i hope to have it finished this week.	22:52
mordred	jlk: yah - that - we _do_ block add_hjost	22:54
mordred	add_host	22:54
jlk	what about set_fact?	22:54
mordred	jlk: we do not block set_fack for ansible_host - so we should probbably do that (and go ahead and do ansible_connection too)	22:54
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: WIP Update nodepool config syntax https://review.openstack.org/448814	22:54
jlk	yeah I thought you had something in the connection plugin we force itself that prevents an attempt to hit localhost tho	22:54
mordred	jlk: in the "normal" action plugin	22:55
mordred	jlk: we do:	22:55
mordred	if (self._play_context.connection == 'local'	22:56
mordred	or self._play_context.remote_addr == 'localhost'	22:56
mordred	or self._play_context.remote_addr.startswith('127.')	22:56
mordred	or self._task.delegate_to == 'localhost'	22:56
mordred	or (self._task.delegate_to	22:56
mordred	and self._task.delegate_to.startswtih('127.'))):	22:56
jlk	gotcha	22:56
jlk	I wonder if play_context gets updated by the host in question	22:56
mordred	so we should definitely block the set_fact route	22:56
jeblair	pabelanger: for 448814 i have just started matching on provider.name.startswith('fake') for now (re the bug in 447647). i'm not super happy about that but it keeps me moving.	22:56
jlk	like if the host context has hte remote addr	22:56
jlk	nod	22:56
jlk	jeblair: for v3, a pipeline can only have one source (driver) still, right?	22:58
mordred	jlk: I should probably do the old versoins too - ansible_ssh_host and ansible_ssh_connection too - right? those are still 'valid' just deprecated?	22:58
jlk	correct	22:58
jlk	http://docs.ansible.com/ansible/intro_inventory.html#list-of-behavioral-inventory-parameters	22:58
jlk	want to prevent setting any ansible_ssh_ stuff	22:59
jlk	or ansible_sftp*	22:59
jlk	or ansible_scp	23:00
jeblair	jlk: currently. that needs to change as described in http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html	23:00
jlk	okay	23:00
*** hashar has quit IRC		23:01
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul.executor.hostname ansible variable https://review.openstack.org/448820	23:01
pabelanger	jeblair: okay, I haven't looked at 448814 as of yet. Likely tomorrow now	23:02
jeblair	pabelanger: you don't have to look at it; it's not ready yet. i pointed out the bug i ran into on 447647.	23:04
pabelanger	jeblair: okay. I confused the patches	23:05
jlk	jeblair: yeah I think I'm bumping into that change barrier. I wonder if I should stop trying to shove this in, and accept that for now you can have gerrit, or you can have github, but you can't have both.	23:06
jeblair	jlk: fwiw, i'm tentatively planning on working on that next week. :)	23:06
jlk	oh then I definitely should.	23:06
Shrews	jeblair: oh cool. Thx	23:16
jlk	holy crap	23:35
jlk	jeblair: I'm probably missing something really really basic, but I might have made this multiple drivers thing work with just one if statement in scheduler.py....	23:36
jlk	at least until you dig into it and tear all that apart :)	23:36
SpamapS	jeblair: seems like it might be nice to stack that on top of a re-factored Change/Ref model	23:42
jlk	hah nope, I broke it	23:43
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Remove Changeish and refactor around Ref as base https://review.openstack.org/448829	23:43
SpamapS	jeblair: ^^ WIP, fails 26 scheduler tests, but it's a start. :-P	23:43
SpamapS	Some of them I think are failing because they're expecting some of the aspects of NullChange instead of having a Ref	23:44
SpamapS	also I think I might need to dig back to HEAD^ for oldrev	23:44
jlk	damn, a timer based trigger doesn't get a trigger_name	23:54
mordred	jlk: when we were talking plugin holes earlier - blocking set_fact of connection things, blocking connection plugins and blocking lookup plugins were the three things we talked about yeah?	23:55
jlk	uh.	23:56
jlk	that reads right from looking at backscroll	23:57
mordred	sweet!	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!