Thursday, 2017-10-05

pabelanger	Ya, zuulv3 does seems to be preforming better	00:20
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix Gearman UnknownJob handler https://review.openstack.org/508992	00:39
*** smyers has quit IRC		00:54
*** smyers has joined #zuul		00:57
*** smyers has quit IRC		01:39
*** smyers has joined #zuul		01:45
*** jhesketh has quit IRC		01:51
*** jhesketh has joined #zuul		01:51
*** jkilpatr has quit IRC		02:46
mordred	rcarrillocruz: I was thinking it was going to fail because you added a line to the test that was SUPER long - but if pep8 allows it, awesome	03:21
*** jamielennox has quit IRC		04:14
*** jamielennox has joined #zuul		04:18
*** bhavik1 has joined #zuul		04:44
*** ricky_ has joined #zuul		08:28
openstackgerrit	Ricardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack https://review.openstack.org/509620	08:33
*** hashar has joined #zuul		08:34
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Use same flake8 config as in zuul https://review.openstack.org/509715	08:34
ricky_	tobiash: please re-review when you get a sec ^	08:34
ricky_	thx	08:34
tobiash	mordred, jeblair, rcarrillocruz: looks like pep8 config in nodepool is broken. ^ would sync it to the same settings as in zuul, but we would have to fix quite some stuff...	08:35
tobiash	ricky_: lgtm	08:35
ricky_	thx	08:37
*** bhavik1 has quit IRC		09:23
kklimonda	can I ship my own ansible action plugins with roles/playbooks? Or perhaps I can explain my usecase: I'd like to expose some additional variables to my tasks (for example I have a repo with debian packaging, I'd like to parse debian/changelog and expose version to other tasks as a variable).	09:41
tobiash	kklimonda: I think you can ship your own modules, but no action plugins as they are restricted by zuul in order to prevent unreviewed code to do bad stuff	09:46
kklimonda	tobiash: is this just matter of trusted vs untrusted projects? i.e. can I ship action plugin if it's part of a trusted project?	09:47
tobiash	kklimonda: by looking into the code I think it's just restricted for untrusted projects:	10:28
tobiash	kklimonda: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/executor/server.py?h=feature/zuulv3#n1533	10:28
tobiash	kklimonda: but I don't know what's the default search path of action plugins in ansible	10:28
*** ricky_ has quit IRC		10:59
*** jkilpatr has joined #zuul		11:15
*** jkilpatr has quit IRC		11:24
*** jkilpatr has joined #zuul		11:37
kklimonda	@tobiash thanks, I'll check it out	12:22
*** SotK_ has joined #zuul		12:41
*** SotK_ has left #zuul		12:48
dmsimard	Are we planning on reloading the executors sometime soon ? I'd like to have https://review.openstack.org/#/c/509254/ in to properly test zuulv3 elastic-recheck changes	13:29
*** dkranz has joined #zuul		13:40
fungi	memory utilization on zuulv3 is looking muuuuch better today: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63979&rra_id=all	14:17
jeblair	yeah, it's still a lot of memory, but not an ever increasing graph	14:21
jeblair	btw, we do know that we leak cached change (and now pull-request) objects. there's not an easy solution to that right now, but they are small, and that's a slow leak.	14:21
mordred	jeblair: with the current rate of change in the scheduler- and existing planned changes - I'm comfortable with a slow leak	14:30
mordred	jeblair: (I mean, I'm fairly certain we'll have at least one change per week we'll wnat to restart to pick up between now and whenever we could fix theleak)	14:30
jeblair	we have about 1k items in the pipelines right now; so more things in memory than we would even have while normally running -- though i don't know what our proportion of dynamic configs is	14:33
pabelanger	ya, memory does look pretty flat this morning	14:37
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	14:38
Shrews	jeblair: probably going to need your expertise for a test for that ^^^	14:38
Shrews	my gerrit "zuulv3" filter is now useless since everyone is using that :(	14:40
jeblair	we should switch to "frank"	14:42
rcarrillocruz	got post failure on zuulv3 for sphinx on https://review.openstack.org/#/c/509620/, but regardless, +1 from Jenkins zuulv2	14:44
rcarrillocruz	are we good to merge mordred ?	14:44
rcarrillocruz	i made a shorter line the assert	14:44
dmellado	rcarrillocruz: I've been seeing that behavior quite a bit, sadly	14:47
rcarrillocruz	thx jeblair	14:47
pabelanger	rcarrillocruz: Docs should not be published for feature branches	14:48
rcarrillocruz	so failure expected	14:48
rcarrillocruz	ok	14:48
pabelanger	ya	14:48
pabelanger	we need to fix the job	14:48
pabelanger	build-openstack-sphinx-docs should be using prepare-infra-docs-for-afs role	14:48
jeblair	pabelanger: that's the doc build job	14:50
jeblair	it should run on all changes	14:50
jeblair	and it should publish to logs.o.o	14:50
pabelanger	jeblair: right, but openstack docs is how it was setup. Which doesn't allow feature branches to be published when it was zuulv2.5 JJB.	14:51
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack https://review.openstack.org/509620	14:51
pabelanger	at once point we did have an build-infra-docs, but I am not sure atm	14:52
pabelanger	looking	14:52
jeblair	pabelanger: let's move this to -infra	14:52
pabelanger	kk	14:52
jeblair	pabelanger, Shrews: i don't see any zookeeper connection issues since i stopped doing objgraph stuff after the restart	15:04
pabelanger	Ya, I think the high CPU load was causing them to drop	15:06
clarkb	cacti reports signifiacntly more sane cpu usage	15:06
pabelanger	clarkb: jeblair: I am noticing we consuming for HDD space, I am not sure we have 2nd drive mounted	15:07
jeblair	Shrews: i think we can probably pause the nodepool provider to control when it fulfills requests, and we can probably manually close the zk connection... i don't think we have a facility to stop scheduler processing of zk events while we do that. i think tests for this could be very difficult.	15:07
jeblair	pabelanger: the 2nd drive is all swap	15:07
pabelanger	HA	15:09
pabelanger	nice	15:09
jeblair	Shrews: there are some more unit-test like tests in test_nodepool... maybe we could do it there	15:09
jeblair	Shrews: the test class itself acts as a scheduler, so it has its own onnodesprovisioned event	15:10
jeblair	Shrews: i think i'd give that a shot -- probably make a new test class because you'll want to control onnodesprovisioned and have it do something differently	15:10
jeblair	Shrews: aha! there's even a test in there for disconnects	15:11
jeblair	Shrews: so i think that has almost all the pieces	15:11
pabelanger	jeblair: assuming all of swap on 2nd drive is wrong, on next stoppage of zuulv3 should we rebuild the 2nd drive and setup properly fstab?	15:13
pabelanger	I'm sorry, but if people are upset there was an outage for 5 days because CI was down or sucked, we should have added more people to zuulv3 effort. Its not like we've been asking for more help.	15:15
pabelanger	wow	15:15
pabelanger	that was the wrong window	15:16
mnaser	its nice that the memory of zuul is relatively stable	15:18
mnaser	and my browser cant even handle rendering how big the queue is :D	15:19
dmsimard	mnaser: time to use zuultty	15:21
mnaser	i need to learn gertty first but maybe ill look into taht	15:22
mnaser	:p	15:22
dmsimard	I always forget where zuultty is hidden, it's like a subfolder in some other project	15:22
mnaser	google search shows... a result of you in eavesdrop	15:23
mnaser	:p	15:23
Shrews	jeblair: k. i'll see if i can figure something out	15:24
dmsimard	mnaser: lmao	15:24
*** hashar is now known as hasharAway		15:24
mnaser	dmsimard https://gist.github.com/sileht/c342606a7ba64761936e	15:24
dmsimard	mnaser: nah that's not it.. hang on, let me find it	15:25
dmsimard	mnaser: https://github.com/harlowja/gerrit_view/blob/master/README.rst#czuul	15:26
mnaser	dmsimard nice!	15:27
fungi	jeblair: seeing a recent jump in memory utilization (not huge, but at least pronounced) in the past few minutes... wondering if any of this will drop as zuulv3 catches up on its backlog	15:30
fungi	that is, once we're adding changes more slowly than we're reporting on them	15:30
jeblair	fungi: i doubt it will ever drop due to python memory management....	15:30
jeblair	also, i don't expect zuulv3's queues to ever shrink in our current configuration	15:31
fungi	"management" needs quotes there ;)	15:31
jeblair	i'm going to poke at more memory things, it may cause disruption again	15:38
*** bhavik1 has joined #zuul		15:56
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use normal docs build jobs https://review.openstack.org/509833	16:49
*** bhavik1 has quit IRC		16:50
pabelanger	jeblair: here is my first attempt at getting kill_after_timeout working locally in GitPython. https://github.com/pabelanger/GitPython/commit/2e78443444c3b836ba3bcd6e6dde62be77ce3779 Not that you are an expert, but any thing pop out as a potential issue?	16:53
pabelanger	when the timeout happens, it will now raise the follow: git.exc.GitCommandError: Cmd('git') failed due to: exit code(-9)	16:53
pabelanger	which we can likey trap and then proceed to clean up the repo	16:53
jeblair	pabelanger: i'm pretty sure the as_process option is tightly integrated with the progress output	16:55
jeblair	pabelanger: so you may want to try setting as_process to false if progress in None	16:56
jeblair	pabelanger: however, in parallel to working upstream, why don't you make a local method for us to use instead of that one, so we don't have to wait for a gitpython release?	16:57
pabelanger	jeblair: Oh, I see. Good idea	16:57
pabelanger	jeblair: Sure, I can try my hand at it	16:57
dmsimard	jeblair, mordred, fungi, clarkb, pabelanger: I don't know if we can make use of this or if it's relevant but it seemed awesome enough that it was at least worth sharing: https://github.com/nickjj/ansigenome	16:59
jeblair	dmsimard: nice, thanks	17:00
*** maxamillion has quit IRC		17:01
*** maxamillion has joined #zuul		17:03
dmsimard	There's some interesting features, like making sure we have READMEs, it can generate them, etc. I might poke at it out of personal curiosity to see what it does.	17:04
odyssey4me	dmsimard nickjj also did https://github.com/nickjj/rolespec	17:17
pabelanger	I seen a talk at ansiblefest SF, using testinfra library too. https://pypi.python.org/pypi/testinfra	17:22
jeblair	we're still using a bit more memory than we should; i'm currently looking at some layouts held in memory because some merger jobs have gotten stuck on the git timeout issue that pabelanger is working on. i'm going to think about whether we need to do anything about that other than just fix the git timeout thing.	17:22
pabelanger	I might start using that for helping test roles	17:22
dmsimard	odyssey4me: one day I would like to see something like serverspec but with ansible	17:23
dmsimard	serverspec being ruby and all that	17:23
dmsimard	https://github.com/larsks/ansible-assertive/ allows for doing non-failing asserts for example	17:24
dmsimard	I had written this a long time ago inspired from stuff that EmilienM did back at Enovance	17:25
dmsimard	https://github.com/dmsimard/openstack-serverspec/blob/master/spec/tests/swift_loadbalancer/swift_proxy_spec.rb	17:25
dmsimard	pabelanger: ah so testinfra is basically like serverspec but in python	17:26
dmsimard	never heard of it before	17:26
* EmilienM hides		17:27
dmsimard	I guess I want to do the same thing as serverspec and testinfra but with ansible proper :D	17:27
dmsimard	pabelanger: oh, but since testinfra is python, we could probably just easily wrap it inside ansible modules	17:27
pabelanger	dmsimard: well, I'd want it to run outside of ansible. EG: have ansible do its thing, then run the python unit test to validate it worked	17:28
pabelanger	otherwise, if ansible is broken, it will be hard to detect that if running inside	17:28
pabelanger	I'll have to find the talk, but it was about molecule at ansiblefest SF	17:28
dmsimard	are the talks online yet ?	17:29
pabelanger	I think so	17:29
dmsimard	I've heard about molecule too but never used it	17:29
jlk	I know that guy	17:29
jlk	who wrote it	17:29
jlk	He used to work at Blue Box	17:29
jlk	alas, I haven't really given molecule a spin yet :(	17:29
pabelanger	Ya, i don't think we'd use it, since it works like beaker. Meant to setup your nodes into containers, then run ansible. But we have zuul / nodepool to do that	17:29
pabelanger	but the testinfra was interesting	17:30
pabelanger	assert file exists, server runs, etc	17:30
pabelanger	service*	17:30
jlk	yeah, there is need for things like that in the enterprise world	17:30
jlk	we used ansible to set up serverspec	17:31
jlk	to validate teh ansible	17:31
dmsimard	I almost went as far as getting monitoring checks to run serverspecs in prod on a regular basis	17:31
dmsimard	but got sidetracked by far less fun things	17:31
jlk	particularly because it wasn't a "continual deployment" environment, Ansible was ran on-demand. So something like serverspec could catch something messing with the system	17:31
jlk	dmsimard: that's exactly what we did	17:32
jlk	sensu alerts for serverspec failures	17:32
dmsimard	jlk: neat, that's what we wanted to do yeah	17:32
dmsimard	but that was at $oldjob :)	17:32
dmsimard	jlk: so the guy worked at metacloud first? then bluebox? does he have a flair for acquisitions or something ? :p	17:33
jlk	bluebox then metacloud	17:33
jlk	he left BB before acquisition	17:33
jlk	(before I joined BB actually)	17:34
Shrews	jeblair: i'm stumped on this test. if i push up the current iteration of it, would you mind showing me where i'm going wrong?	17:38
pabelanger	okay, I think I fixed kill_after_timeout upstream: https://github.com/gitpython-developers/GitPython/pull/683 working on a zuul function now	17:41
kklimonda	zuul doesn't seem to be doing anything to ensure that all jobs that are part of the dependency graph will run on the same cloud, right?	17:42
pabelanger	nodes in the same nodeset should be on the same cloud	17:43
kklimonda	but nodeset is per job, right?	17:44
pabelanger	Right	17:44
pabelanger	the only way to pin jobs to a cloud, would be to create a unique label for said cloud	17:45
pabelanger	we do this today for tripleo-centos-7 images	17:45
pabelanger	and they only run jobs on tripleo-test-cloud-rh1, for historical reasons	17:45
kklimonda	yes, but I don't want to pin it to a specific cloud, just make sure that a given set of jobs will all run on a single cloud	17:45
Shrews	kklimonda: what's the use case for that requirement?	17:46
pabelanger	I don't think we support that currently	17:46
kklimonda	Shrews: I need to build 1GB of packages and then install them for testing.	17:46
pabelanger	sounds like artifact handling?	17:47
kklimonda	yeah	17:47
pabelanger	ya, so this is something we don't do too well atm	17:47
pabelanger	how we worked around it was regional mirrors / proxies to help with that	17:47
kklimonda	well, probably less then that - a lot of packages are dbg symbols, but I'll still end up with ~100MB of packages that have to actually be transferred per build.	17:47
kklimonda	sure, but mirrors/proxies don't help me when it's all new artifacts each time	17:48
pabelanger	right	17:48
kklimonda	(not that I don't need those anyway)	17:48
pabelanger	we basically have the same issue today with the kolla project. They upload large artifacts to tarballs.o.o, and jobs then download from it.	17:49
clarkb	we've talked about possibly using shared cinder volumes for that which would imply scheduling to one cloud region.	17:49
pabelanger	yah	17:49
clarkb	but cinser volumes arent currently multi attachable so that has been possible future work	17:49
dmsimard	nor can we guarantee that cinder will be available in every cloud ?	17:52
clarkb	dmsimard: correct though only infracloud doesnt in our current clouds iirc	17:53
jeblair	pabelanger: thinking about it a bit more -- maybe we've only seen the git hangs on https? so maybe we should go with that solution, and consider the gitpython timeout as a backup solution we can implement later if needed. what do you think?	18:04
jeblair	Shrews: of course!	18:05
pabelanger	jeblair: actually, yah. Looking at etherpad it was HTTPs. So sure, let me get some .gitconfig settings going	18:09
pabelanger	jeblair: do you have recommendations on limites for ratelimit and time?	18:09
pabelanger	jeblair: also, I'm having a hard time understanding if our WatchDog for zuulv3 executor is working. I don't see how we abort the process any more	18:10
pabelanger	not important at the moment, maybe when you have spare time	18:10
jeblair	pabelanger: maybe as close to "no data in 30s" as we can get? so probably 30s for the time and the lowest non-zero value you can do for rate?	18:10
jeblair	pabelanger: i thought we timeout out builds all the time? :)	18:11
pabelanger	we do, i think I'm just not understanding https://review.openstack.org/426306/ which is where we stopped passing the proc into Watchdog class	18:12
pabelanger	but first, I'll do .gitconfig changes	18:13
jeblair	pabelanger: we pass an instance method to the watchdoc. the instance method aborts self.proc	18:14
mnaser	status.json for zuulc3 is almost 4.4M at this point heh	18:16
jeblair	what i'd really like to do with that is send updates over websocket	18:16
jeblair	of course, zuul itself barely knows when something changes at this point, so that's a ways down the road	18:17
pabelanger	jeblair: doh, I see it now. Thank you	18:18
*** electrofelix has quit IRC		18:20
pabelanger	jeblair: okay, settings work locally: http://paste.openstack.org/show/622785/	18:24
pabelanger	proposing patch for 1000 bytes/sec for 30 sec	18:24
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	18:34
Shrews	jeblair: thx. see test_nodepool.py	18:35
dmsimard	mordred, jeblair: https://www.anandtech.com/show/11902/thinkpad-anniversary-edition-25-limited-edition-thinkpad-goes-retro :)	18:37
mordred	dmsimard: yes. it gives me great excitement	18:39
dmsimard	I was almost excited until I saw the Geforce in it ? My W541 has an nvidia card and it has brought me nothing but trouble :(	18:40
dmsimard	Starting at 1899$, ouch	18:40
jeblair	Shrews: hey cool, you found a test for another issue on the etherpad!	18:45
jeblair	that's line 148 -- kazoo callback error	18:46
Shrews	jeblair: quite by accident	18:46
Shrews	jeblair: oh, i just notice i missed setting fail_first_request in setup (used to be there, but then made the base class and accidentally removed it)	18:49
jeblair	Shrews: ok, i think i understand the exception -- in the test, we're doing a zk operation in the callback we can't do -- we can't shut down zk from the zk callback	18:51
jeblair	that means the error is something different from product.....oh.... i bet in production we somehow hit that inside the resubmit (which happens in the callback).	18:51
jeblair	Shrews: oh, no, strike that.	18:52
jeblair	Shrews: the production error is actually something we can ignore, i think.	18:52
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add git timeout for HTTP(S) operations https://review.openstack.org/509876	18:53
jeblair	Shrews: the key to this is that this is happening inside of an execption handler, and the "TypeError: callback() takes 2 positional arguments but 3 were given" error is a red herring	18:53
pabelanger	jeblair: how does that look^	18:53
jeblair	Shrews: that's a harmless exception, it's the one after that matters	18:53
Shrews	jeblair: the callback is _updateNodeRequest(), yeah? i was having a devil of a time trying to get that to trigger	18:53
jeblair	Shrews: yep	18:53
jeblair	Shrews: how about this? in onNodesProvisioned, set an Event, and .wait() for it inside the main test method. and then kill zk in the main test method	18:55
jeblair	Shrews: then it'll happen outside the callback thread; should work	18:55
Shrews	jeblair: should I delete the request from onNodesProvisioned before setting the event? i'm not seeing how to invalidate the first request	19:01
jeblair	Shrews: it should be deleted automatically after the zk client disconnections (it's ephemeral)	19:03
Shrews	jeblair: yeah, but doing it in the main test method seems too late. we need it trigger again by the time it gets back to the main thread.	19:04
Shrews	i hate to admit that i'm totally lost by the sequencing here :(	19:04
Shrews	i tried the Event thing and i'm not seeing it retrigger	19:05
jeblair	Shrews: nevermind the test -- what's the sequence you need to have happen?	19:07
jeblair	Shrews: maybe you can write that up on an etherpad, and i can take a look at it when i get back from lunch	19:08
Shrews	1) request A fulfilled 2) req A enters event queue 3) before the queue is processed, req A disappears, causing req A to resubmit 4) waitForRequests() returns	19:09
Shrews	0) submit request A	19:10
Shrews	if anyone else more familiar with zuul testing wants to take a stab, by all means, please :)	19:11
tobiash	Shrews: sounds like monkey patching might be able to help disappearing req in step 3	19:17
tobiash	I don't have the code at hand currently, but I could imagine that a disappearing req could be injected into the event processing like that	19:20
Shrews	i think the error with my thinking is believing that there is an event queue in test-land. which now leaves me more confuzzled about how to test this	19:32
* Shrews walks		19:33
*** hasharAway is now known as hashar		19:50
*** ianw\|pto is now known as ianw		20:00
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	20:01
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	20:02
Shrews	gah	20:02
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	20:03
jeblair	Shrews: here's what i've got: https://etherpad.openstack.org/p/4cZGJDn02i	20:03
Shrews	jeblair: i think i got it now	20:03
*** jkilpatr has quit IRC		20:03
jeblair	Shrews: and yeah, i think we need to make our own "event loop" in the test, since the test is standing in for the scheduler	20:03
Shrews	jeblair: took me a while to realize how to actually get to my codepath to test	20:03
Shrews	i don't think we need to introduce the event loop	20:04
jeblair	Shrews: well, we want something to call acceptNodes with outdated info, right?	20:04
jeblair	Shrews: (ie, we should end up calling acceptNodes twice in that test)	20:05
jeblair	hope that makes sense; the main test thread is on the left; other threads are on the right	20:06
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time https://review.openstack.org/509893	20:06
pabelanger	okay, git timeout patches uploaded	20:07
Shrews	jeblair: you don't feel that simulating those conditions as i did in that new PS is sufficient? if not, then yeah, we'll have to add an event loop	20:08
jeblair	Shrews: i hadn't looked. apparently i was working on the etherpad while you were updating the change.	20:08
jeblair	Shrews: i think those are good tests as long as we've got the sequencing right. i think the only thing they don't do is actually exercise the zk disconnect callback. however, test_node_request_disconnect covers that separately, so we may be okay.	20:11
jeblair	Shrews: if you're happy, i'm happy :)	20:11
Shrews	jeblair: not even test_node_request is testing the event queue path. i really just needed a way to exercise acceptNodes(), which i think those do	20:12
Shrews	and my head hurts. :) being dumb/hardheaded takes a lot of energy	20:13
Shrews	must be all the beer from my younger years	20:14
jeblair	Shrews: yeah -- the first thing the scheduler event processor does is acceptNodes; so all of these are doing a first-order approximation of that and assuming nothing interesting happens after	20:14
pabelanger	etherpad also updated	20:14
jeblair	i think that's okay for the scope of these tests	20:14
jeblair	pabelanger: cool, thx. +3 on the first and small -1s on the second	20:21
*** jkilpatr has joined #zuul		20:22
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time https://review.openstack.org/509893	20:33
jeblair	i have found a second memory leak triggered by these git timeouts. i have a test case and fix in progress.	20:34
jeblair	(and i have concluded that we should fix this regardless of the git timeout issue)	20:35
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add git timeout for HTTP(S) operations https://review.openstack.org/509876	20:35
jeblair	pabelanger: do you want to update and restart executors with that ^ ?	20:37
pabelanger	jeblair: great work!	20:37
pabelanger	jeblair: sure, give me a moment to fetch a drink	20:38
*** dkranz has quit IRC		20:43
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix path exclusions https://review.openstack.org/509901	20:47
mordred	jeblair: ok - after thinking WAY too hard about that ^^ I think that should do us	20:48
jeblair	mordred: heh, the last thing i remember from you on the subject was "let me write that real quick!" :)	20:50
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue https://review.openstack.org/509903	20:54
jeblair	there's memory leak #2	20:54
jeblair	i also realized there's another bug that helped uncover that; it's minor, but i'll try to fix it now too. causes us to run too many merge operations, and we could always do with fewer of those.	20:55
mordred	jeblair: yah - 'real quick' got distracted and pushed down on the stack until I remembered that I was going to write it real quick	20:56
jeblair	mordred: zomg the commit-message-longer-than-bugfix club needs a secret handshake.	20:57
jeblair	i was settling in for a long review and now find myself unprepared	20:58
mordred	jeblair: hehehe	20:58
mordred	jeblair: your most recent patch isn't a easy to read as the previous memory leak - you had to touch WAY more than one line there	20:59
jeblair	mordred: indeed; i also haven't run the full test suite against it; could have some lurking bugs still.	21:00
mordred	jeblair: it'll fun the full test suite against itself	21:01
clarkb	mordred: does path fix work if user changes $HOME	21:01
clarkb	is that even possible in bwrap?	21:01
mordred	clarkb: unpossible	21:01
clarkb	because passwd is ro?	21:01
mordred	clarkb: the user doesn't have the execution context to change the environment in which ansible-playbook is executed	21:02
pabelanger	okay, starting to restart executors. puppet has been run	21:02
mordred	clarkb: they can set environment in tasks - but those are all executed by ansible-playbook, so are subshells of the shell where the env is checked	21:03
clarkb	mordred: even via something like /proc?	21:03
mordred	wel - they can't write to /proc unless the path filter is already busted	21:03
jeblair	(worth noting, this can be improved when ansible 2.4.1 is released and we can get this value from an ansible.cfg file)	21:04
mordred	but it's all sequencing - zuul-executor executes ansible-playbook and passes an explicit environment to that subprocess - the action plugin that checks the path against HOME is in that process - and the user shouldn't have access to change the environment that exists there	21:04
*** hashar has quit IRC		21:04
mordred	also what jeblair said	21:04
clarkb	that was going to be my next question, can we supply it directly as config rather than potentially user changeablr env stuff	21:05
mordred	yah - most definitey once 2.4.1 is out	21:05
clarkb	soubds like later, ok	21:05
mordred	but also - if the user is able to change HOME - that should be considered a SERIOUS issue	21:05
mordred	as that would mean that the user was able to execute abitrary code in a context that they should not be able to execute arbitrary code	21:06
mordred	which is not to say it's unpossible - obviously- but if we find an instance of that we should drop everything and think about nothing but that	21:06
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix doc typo that inverts logic https://review.openstack.org/509905	21:17
mordred	jeblair: doublecheck me on that ^^ but I just noticed that looking at the docs for something else	21:18
pabelanger	so far, ze05 and ze06 were in hung state for git clone. They've been since restarted with fixes, moving to ze07	21:21
jeblair	mordred: good catch, but i think the fix is different; commented	21:25
pabelanger	jeblair: all executors upgraded and restarted	21:26
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	21:27
Shrews	noticed i missed a var for a %s ^^^	21:27
jeblair	dmsimard: ^ re executors	21:28
mordred	jeblair: ah - good -I'll update that - and that tells me I want to set some of our publish jobs to be post-review: true as wel	21:28
dmsimard	Yay, thanks	21:28
jeblair	mordred: do we have publish jobs defined outside of project-config?	21:29
jeblair	mordred: or i can just wait for the change and review it :)	21:29
mordred	jeblair: oh - no, we don't. nevermind. all good	21:30
jeblair	kk	21:30
pabelanger	jeblair: I'm going to switch to ze03.o.o stopped issue again	21:35
pabelanger	unless something else you'd like me to do	21:35
jeblair	pabelanger: thanks	21:41
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912	21:59
mordred	jeblair: wow. that's a fun one	22:00
jeblair	mordred: yeah, that happened, and exposed the memory leak, and i was writing a test around it, and realised, "hey, maybe i shouldn't write a test that replicates the behavior of a bug; maybe i should fix the bug."	22:07
mnaser	jeblair im looking at a lot of inefficent behaviour at the moment in the zuul status page	22:15
mnaser	would it be okay to introduce 1 additional dependency (mustache)	22:15
*** hogepodge has joined #zuul		22:15
mnaser	its a very simple/small javascript templating language that will help the translation of status state <=> html	22:16
jeblair	sounds pretty hipster. :)	22:19
mordred	mnaser: I should probably talk to you about the 'rework how we deal with javascript and html' patches I'm going to get back to once the v3 rollout is done ...	22:20
mnaser	jeblair or maybe angular? that would simplify the code base soooo much	22:20
mnaser	mordred jeblair i could probably get the status page redone in angular.. tonight. maintaining the same look :>	22:20
mordred	mnaser: I believe tristanC's dashboard work introduces angular - so maybe we should just put a pin in improvements here until the rollout is done and we can give some attention to how it's all being stiched together	22:21
mordred	mnaser: we've been holding off on that work until post-rollout so that we don't get too distracted ... one sec though, I'll link you to a couple of patches	22:21
mnaser	ok ill have a look	22:21
mnaser	not like i can help much in the internals of zuul and finding memory leaks :-P	22:22
tristanC	mordred: mnaser: indeed, the zuul-web patch for tenants, jobs and builds are using angular: https://review.openstack.org/#/q/topic:zuul-web	22:22
jeblair	ya -- my only request for angular is that it be understandable by folks who have used web systems other than angular -- i'm able to follow the patterns that tristanC has used fairly easily	22:23
fungi	for some reason i always confuse angularjs with reactjs (the facebook one with the patent license controversy)	22:23
mnaser	jeblair i agree 100% -- i dont want to leave zuul with some complicated codebase no one knows how to fix if im not available	22:24
mordred	mnaser: https://review.openstack.org/#/c/487538/ and https://review.openstack.org/#/c/503268/ are the two relevant pieces	22:24
jeblair	mnaser: exactly! :)	22:24
mnaser	mordred ++ to using webpack to manage dependencies	22:24
mordred	mnaser: the first is some initial exploration I did around incorporating javascript toolchains - the second is the first patch from tristanC that adds angular and uses it using the current setup	22:24
jeblair	so maybe building on mordred and tristanC's work is the best bet. i think the only caveat is that we won't really review+merge larger changes like that until after the dust settles (but probably soon after the dust settles)	22:25
mordred	yah - I want the dashboard :)	22:25
mnaser	the nice thing is the status page is really well/easily tested	22:25
jeblair	so just know there will be a bit of a delay if we go that route. but in the long run, it's the best i think.	22:26
mnaser	thanks to whoever wrote the ?demo= stuff	22:26
mordred	that's one of the reasons I used status page as a trial balloon for the toolchain stuff :)	22:26
mnaser	i'll work on angular-ifying the status page and then we can "integrate" it with the other angular-based pages later, it shouldn't be too much work (if i do it correctly)	22:26
mnaser	because personally the status page is currently unusable for me, always crashes my browser :(	22:27
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix doc typo that missed important words https://review.openstack.org/509905	22:27
mnaser	the v3 one that is, with the big status.json file	22:27
jeblair	me too	22:27
mordred	mnaser: I'm rebasing that patch of mine real quick - silly merge conflicts	22:29
fungi	mnaser: is the v2 one at http://zuul.openstack.org/ (not the custom one we have on status.o.o) actually any better in that regard?	22:31
fungi	it also is nigh unusable for me at relatively high queue sizes	22:31
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use yarn and webpack to manage status javascript https://review.openstack.org/487538	22:39
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Migrate console streaming to webpack/yarn https://review.openstack.org/487539	22:39
mordred	mnaser: in https://review.openstack.org/487538 if you do "yarn install ; npm run start:livev3" it'll spin it up in a local dev server pointed at zuulv3.openstack.org	22:39
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: alien-list: use provider name https://review.openstack.org/508788	22:42
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Migrate console streaming to webpack/yarn https://review.openstack.org/487539	22:52
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Use yarn and webpack to manage status javascript https://review.openstack.org/487538	22:54
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Migrate console streaming to webpack/yarn https://review.openstack.org/487539	22:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue https://review.openstack.org/509903	23:30
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912	23:30

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!