Friday, 2018-04-27

clarkb	ssh happens in a forked process so I don't think paramiko updating would affect that	00:00
SpamapS	AFAIK we don't use paramiko	00:00
clarkb	we do just at the beginning for host key handling	00:01
SpamapS	yeah just confirmed, ansible is using ssh	00:01
SpamapS	ah dunno about that	00:01
SpamapS	but I see what you're saying	00:01
clarkb	but ya ansible is forked and uses openssh by default	00:01
SpamapS	nodepool might have torched the node	00:01
SpamapS	because my deploy job is just a job that runs on a bastion	00:01
SpamapS	the bastion being a regular node	00:01
SpamapS	hrm.. hard to find the node's real hostname anywhere	00:04
SpamapS	since it was a post_failure	00:04
SpamapS	no logs were saved	00:04
SpamapS	which kinda sucks.. probably bad form on my post playbook part	00:04
clarkb	connectivity problems do make this difficult	00:04
clarkb	I say as I need to address my derp home networking. Warm eather seems to have made my office's wireless bridge device unhappy	00:05
SpamapS	nodepool did delete the node out from under the job	00:07
SpamapS	I wonder if I am restarting zookeeper or something	00:07
*** rlandy has quit IRC		00:08
SpamapS	2018-04-26 15:45:23,507 DEBUG zuul.AnsibleJob: [build: f287dae10e8d4b98a70a97b21f7f021c] Ansible output: b'RUNNING HANDLER [zookeeper : Restart zookeeper] ********************************'	00:09
SpamapS	yep	00:09
SpamapS	restarted it, which presumably caused the lock to be lost	00:09
SpamapS	well now at least I know	00:09
SpamapS	- meta: flush_handlers	00:13
SpamapS	>:\|	00:13
tristanC	fdegir: zuul.rpm only contains the cli and the module... the doc, webui and services are sub packages. you can get them all using "yum install rh-python35-zuul-*"	00:25
tristanC	clarkb: the lock should survive a zookeeper restart if the client reconnect before the session timeout	00:39
clarkb	tristanC: it may actually happen because nodepool sees all the nodes as aliens if zk isnt responding?	01:16
*** harlowja has quit IRC		01:23
tristanC	clarkb: can't find that behavior in the launcher code, maybe this happens if a zk call is executed when the service is down	01:35
clarkb	corvus: so I dont forget your changes to config loading probably deserve a release note	02:16
SpamapS	http://paste.openstack.org/show/719983/	03:41
SpamapS	Been getting these a lot	03:41
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Sometimes GitHub doesn't return repo permissions https://review.openstack.org/564666	03:54
SpamapS	^^ looks like a simple case of assuming the latest version of an API that isn't stable.	03:54
SpamapS	Heh in fact, looks like GHE 2.13 doesn't even have /collaborators	03:56
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543	04:13
SpamapS	hrm	04:36
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543	04:40
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Make revoke-sudo work on base cloud-init images https://review.openstack.org/564674	04:45
SpamapS	^^ FYI, I want this for our internal cloud tests here at GD, because I want to run things like tox/flake8/etc. with the exact image that most of our users use..	04:46
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543	05:06
*** swest has joined #zuul		05:12
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add OpenAPI documentation https://review.openstack.org/535541	05:52
SpamapS	hrm, how does ensure-tox work exactly? it installs tox with --user ... but .local/bin is only added to path on login shells.. which you don't get with the command: module.	05:57
SpamapS	Guessing I need to start installing tox without --user	05:57
tristanC	SpamapS: .local/bin could be added to the environment, like so: https://review.openstack.org/#/c/532083/7/roles/ansible-lint/tasks/main.yaml	06:08
SpamapS	tristanC: yeah, it could. But it's not yet. ;)	06:33
SpamapS	and I believe this works fine because tox is pre-installed on custom images	06:33
*** yolanda__ is now known as yolanda		06:59
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: builder: support setting diskimage env-vars in secure configuration https://review.openstack.org/564687	07:13
*** xinliang has quit IRC		07:14
*** xinliang has joined #zuul		07:15
*** ssbarnea_ has joined #zuul		07:45
*** hashar has joined #zuul		07:47
*** jamesblonde has joined #zuul		07:50
*** jpena\|off is now known as jpena		07:52
jamesblonde	hello :) is there some people to answer my questions ?	07:54
tobiash	jamesblonde: just post your question, but note that most people here are located in us timezones	07:59
jamesblonde	that's why I asked so I will try to stay tuned. My question is how does the nodepool is connected to jenkins ?	08:18
openstackgerrit	Matthieu Huin proposed openstack-infra/nodepool master: Add separate modules for management commands https://review.openstack.org/536303	08:28
openstackgerrit	Matthieu Huin proposed openstack-infra/nodepool master: Add separate modules for management commands https://review.openstack.org/536303	08:37
jamesblonde	and what difference between Zuul Launcher + Zuul trigger (v2) and Zuul executor (v3) both were replaced by this ?	08:49
tobiash	jamesblonde: nodepool v2 or v3?	09:05
tobiash	v3 has no connection to jenkins (as there is no jenkins with zuul v3)	09:06
tobiash	jamesblonde: zuul launcher (v2) was replaced by zuul executor (v3)	09:07
tobiash	jamesblonde: not sure what you mean with zuul trigger (v2)	09:08
*** jamesblonde has quit IRC		09:10
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	10:23
*** CrayZee has joined #zuul		10:27
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	10:30
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	10:45
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	11:19
*** jpena is now known as jpena\|lunch		11:56
*** ssbarnea_ has quit IRC		12:06
*** ssbarnea_ has joined #zuul		12:07
*** ssbarnea_ has quit IRC		12:09
*** ssbarnea_ has joined #zuul		12:13
mordred	tobiash: perhaps we need a FAQ entry for v2 -> v3 migrations - I think this is the second day a similar question has been asked about launchers/executors - might be nice if we had a short page "so you're already running a zuul v2 and looking to upgrade"	12:18
tobiash	mordred: good idea	12:18
mordred	because they're certainly fair questions	12:18
tobiash	mordred: so you're back from traveling hell?	12:18
tobiash	;)	12:19
*** ssbarnea_ has quit IRC		12:19
mordred	tobiash: yes!	12:19
mordred	tobiash: my couch at home is much more comfortable than places that are not my couch at home	12:20
*** jamesblonde has joined #zuul		12:20
tobiash	mordred: I can imagine that	12:20
jamesblonde	the nodepool that comes with zuul v3 ?	12:21
tobiash	jamesblonde: the nodepool that comes with v3 has no linkage to jenkins	12:21
*** ssbarnea_ has joined #zuul		12:21
tobiash	jamesblonde: as jenkins is replaced in v3 by zuul-executor	12:21
jamesblonde	ok got it, and what if i want to keep using jenkins with gearman plugin ? should I keep executors ?	12:22
mordred	jamesblonde: you should not upgrade to zuul v3 at the moment if you want to keep using jenkins. however, there are a few people - electrofelix is one - who have been working on zuul v3 + jenkins	12:23
jamesblonde	That's my though right now. Is there a particular reason ? Not tested yet ?	12:24
tobiash	jamesblonde: the data driven architecture has been changed	12:24
tobiash	jamesblonde: the sources are now pushed to the nodes by the executor	12:25
tobiash	jamesblonde: the merger doesn't serve any repos anymore	12:25
pabelanger	mordred: jamesblonde: tobiash: I was recently pointed to https://github.com/jenkinsci/nodepool-agents-plugin for nodepoolv3 and jenkins	12:25
pabelanger	I believe it is coming out of rackspace	12:25
mordred	pabelanger: neat! yah - that's hughsaunders and odyssey4me	12:26
pabelanger	yar	12:26
jamesblonde	Yes instead of pulling it with jenkins... I am asking because we don't want to use openstack cloud but wanted to migrate to zuul v3.	12:26
mordred	jamesblonde: oh - well, you don't have to use an openstack cloud with v3	12:27
odyssey4me	yep, I didn't do the development - that's down to hughsaunders and some other team members... I'm just a tester :)	12:27
jamesblonde	Thanks for your recommandation, i check the repo and keep in mind for electrofelix	12:27
mordred	jamesblonde: zuulv3 has direct support for pre-defined static nodes	12:27
mordred	as well as a growing number of non-openstack node providers	12:27
mordred	so if that's the reason you wanted to keep your jenkins - we've got you covered :)	12:27
mordred	I should say - nodepool v3 has direct support for static nodes as well as a growing number of non-openstack dynamic node providers	12:28
mordred	zuul v3 has support for whatever nodepool gives it :)	12:28
jamesblonde	So that would be the best for us	12:28
mordred	\o/	12:28
mordred	odyssey4me: 'just a tester'	12:29
jamesblonde	And in this case zuul executor is not needed like in the v2 ? Should I keep my zuul launcher & trigger ?	12:30
pabelanger	right, zuul-executor will only work with zuulv3	12:30
mordred	wait - I think y'all just said different things	12:30
mordred	in v3 you need a zuul-scheduler and at least one executor	12:31
mordred	and a nodepool	12:31
jamesblonde	That's what I want, use zuul v3 but using another node pool manager	12:31
pabelanger	ah, yes. I was only focusing on zuul-launcher / zuul-executor part	12:31
mordred	yah - so that's not really a thing with zuul v3	12:31
mordred	zuul v3 gets nodes from nodepool	12:32
mordred	if you want to use zuul v3 and the nodes are somewhere, the best bet would be to write a plugin for nodepool to get nodes from whatever is managing them	12:32
jamesblonde	ok so zuul v3 is only pre-configured to work with nodepool	12:32
tobiash	jamesblonde: yes, so essentially nodepool is actually a mandatory part of zuul now	12:33
mordred	yes	12:33
mordred	but - it itself is pluggable - so nodepool should be able to get nodes from whatever system - be it static or openstack or ec2 or something homegrown	12:34
tobiash	jamesblonde: but as mordred said nodepool also can manage a pool of e.g. statically defined nodes	12:34
electrofelix	jamesblonde: from my testing so far the upgrade path will be migrate to nodepool v3 with jenkins, hughsaunders is the person to chat to about the plugin, once we've had a chance to migrate locally ourselves hoping to help him with that plugin and subsequently a zuul-trigger plugin to allow zuul v3 -> jenkins communication	12:35
jamesblonde	ok i am going to think about writing such a plugin	12:35
tobiash	jamesblonde: do you use a system for dynamic node provisioning?	12:36
electrofelix	jamesblonde: I'd get nodepoolv3 integrated with jenkins first as a first pass	12:37
mordred	electrofelix: I think the issue is that the reason they were having jenkins in the mix was to avoid nodepool since they have jenkins getting nodes from somewhere else	12:37
jamesblonde	we use AWS cloud only but it is more VM instances	12:37
mordred	jamesblonde: there is an ec2 driver up for review already actually	12:37
mordred	jamesblonde: https://review.openstack.org/#/c/535558/	12:38
pabelanger	yah, best to talk with tristanC about nodepool drivers, he writes them in his sleep :)	12:38
jamesblonde	my goal is to use a Zuul v2 like behavior with a set of dynamic nodes to manage ephemeral ressources (because today we have 5 fulltime jenkins masters running)	12:38
tobiash	jamesblonde: in this case I think you want v3 without jenkins and with nodepool and https://review.openstack.org/535558	12:39
*** rlandy has joined #zuul		12:39
jamesblonde	I thinks that exactly what we are looking for	12:40
mordred	sweet	12:41
jamesblonde	so I was bad thinking that nodepool was made for OpenStack based cloud => as you can read here https://docs.openstack.org/infra/nodepool/ "It is designed to work with any OpenStack based cloud,"	12:41
mordred	oh. heh. good call!	12:41
jamesblonde	(i am french so that sentence made me think I had to use a openstack based cloud, and not standalone physical or virtual machine)	12:42
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool master: Clarify in doc introduction that OpenStack is not required https://review.openstack.org/564746	12:44
mordred	jamesblonde: ^^ maybe that will prevent such confusion next time	12:44
jamesblonde	oh indeed	12:45
jamesblonde	yes i did not see that one	12:45
jamesblonde	good review btw ;)	12:45
mordred	\o/	12:46
jamesblonde	It is clearer in my brain now ^^ i am going to test it using the aws driver	12:46
mordred	jamesblonde: sweet. I think tristanC has used that for some things, so it should work, but it's also new, so please let us know if you have any issues with it	12:47
jamesblonde	of course I'll come back and write some doc too	12:48
jamesblonde	(if i plan to use it	12:48
Shrews	i think the proposed aws driver is fairly limited	12:48
Shrews	more of a WIP	12:49
mordred	Shrews: you're more of a WIP	12:50
jamesblonde	i will let you know but could be a good idea to contribute on it	12:50
mordred	++ that would be very welcome	12:50
Shrews	/ignore mordred --reason "just cause"	12:50
*** jpena\|lunch is now known as jpena		12:54
openstackgerrit	Merged openstack-infra/zuul master: Fix zuul home directory in zuul from scratch document https://review.openstack.org/564386	12:57
openstackgerrit	Merged openstack-infra/nodepool master: Clarify in doc introduction that OpenStack is not required https://review.openstack.org/564746	12:57
*** dkranz has joined #zuul		13:14
SpamapS	Even if it is limited..	13:49
SpamapS	It's needed.	13:49
SpamapS	And you have to start somewhere.	13:49
SpamapS	What's blocking it currently?	13:49
* SpamapS going through the review slowly		13:49
SpamapS	also, if we do want people to write drivers, https://review.openstack.org/#/c/535555/ is critical	13:50
SpamapS	(and has two +2's.. so...)	13:50
SpamapS	Make that 3	13:53
SpamapS	Shrews: was there some unstated reason we haven't landed 535555?	13:53
SpamapS	Actually it just looks like it's been sitting ready to ship for a few days, so, +3'd	13:54
Shrews	SpamapS: nope	13:55
SpamapS	werd	14:02
Shrews	i sort of want to make the drivers pluggable (except for openstack and maybe static) so others don't need to wait on nodepool releases to get the latest and greatest driver	14:04
Shrews	i wonder how others feel about that though	14:04
Shrews	also, i don't really want to review AWS changes :)	14:04
openstackgerrit	Merged openstack-infra/nodepool master: Refactor NodeLauncher to be generic https://review.openstack.org/535555	14:05
Shrews	or VMWare changes	14:07
Shrews	or Azure changes	14:07
Shrews	etc	14:07
mordred	Shrews: well - I'd agree, except the horizon/neutron plugin testing mess makes me think twice about that	14:08
Shrews	mordred: i am not aware of the details there	14:08
mordred	Shrews: it's solveable - but basically the out of tree driver needs the thing it's a driver for in order to test - it's probably fine for us since we release frequently	14:09
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool master: Add fedora-28 to nodepool dsvm https://review.openstack.org/559211	14:09
Shrews	mordred: well, i mean, the shifting of responsibility to make sure it works with nodepool is the main reason i'd like it pluggable	14:11
Shrews	because we can't test with any other provider ourselves	14:11
Shrews	so we'd just be guessing	14:11
Shrews	but if a driver author has those resources to test... great	14:11
mordred	nod. yeah - it's a topic we should certainly consider how to deal with	14:12
Shrews	i just imagine someone coming to #zuul and saying "hey, the aws driver doesn't do this thing"	14:12
Shrews	how can we (other than tristanC) test and fix?	14:13
mordred	Shrews: that said, I bet markatwood would give us quota to test the ec2 driver	14:13
Shrews	mordred: there's still vmware, azure, kubernetes, some-other-latest-greatest-thing	14:13
Shrews	i guess i've already decided which way I lean on this :)	14:14
mordred	vmware is the only one f those that seems problematic though - since we'd have to install vmware and that would suck	14:14
mordred	Shrews: hehe	14:14
pabelanger	it would be great for driver author to setup 3pci on the nodepool driver some how	14:14
pabelanger	and report results	14:14
rcarrillocruz	problem is on clouds/products that are not free	14:15
pabelanger	unless openstack-infra gets credentials to azure / aws	14:15
Shrews	it's not just access... it's a working knowledge of the thing	14:15
rcarrillocruz	like, tristan developed his asw driver by using the free tier account	14:15
rcarrillocruz	but that goes away after a year me thinkg	14:15
pabelanger	yah	14:15
mordred	I think my concern is that I don't want to wind up with key things in a contrib ghetto	14:15
Shrews	e.g., i don't have any desire to learn vmware	14:15
rcarrillocruz	lol	14:15
mordred	however we can make sure that they're in good shape and reasonable for people to use I'm in favor of	14:16
pabelanger	Shrews: easy, nova vmware driver	14:16
mordred	pabelanger: ;)	14:16
pabelanger	monies please	14:16
rcarrillocruz	this is what ansible folks use to test vmware modules, https://github.com/vmware/govmomi/tree/master/vcsim , but yeah, i hear what Shrews says about 'knowing everything about all drivers to review them'	14:16
pabelanger	:D	14:16
mordred	yah. the ansible community choice to empower driver authors to care about their driver is more scalable than them all having to learn all of the drivers	14:17
mordred	so it might be more of figuring out what the line isof drivers that we think are important enough that we should collectively learn something about them	14:17
mordred	and also have a mechanism for people who want to care and feed for a driver that we can not care about	14:18
Shrews	i think this warrants a ML discussion. i can start that up	14:18
mordred	coolio	14:19
mordred	cause I think the major cloud providers (other than openstack of course) - ec2, gce and azure - are ones we should have out of the box support for - just like having out of the box support for github for zuul	14:20
mordred	now - the others - the digital oceans and mac stadiums - the line gets much more blurry for me	14:20
Shrews	i think the line needs to be drawn on what we actively test	14:21
Shrews	not on popularity	14:22
Shrews	but i'll put that in the initial email	14:22
tobiash	Shrews, mordred: the pluggable driver interface was discussed a few months ago and the decision at that time was that we want such a thing but need time to stabilize the driver api first before making that public	14:25
mordred	tobiash: ++	14:26
mordred	Shrews: oh totally - but I think we should actively test ec2, azure and gce in addition to openstack	14:26
tobiash	I think corvus wanted to land a few more drivers before making that step to get real experiences	14:27
mordred	(assuming, of course, we can get donated quota to do such a thing)	14:27
tobiash	so maybe we want to wait until some of tristanC's drivers landed to validate that the internal api works and can be published	14:27
Shrews	mordred: if we can actively test them, i'm more ameniable to having them in-tree	14:28
Shrews	amenable	14:29
Shrews	words are hard	14:29
* mordred hands Shrews a box of ameniable rhinocerouses		14:29
Shrews	mmm, yummy	14:29
dmsimard	mordred: I think I found a bug in the zuul UI ? If I go here: http://zuul.openstack.org/jobs.html and then ctrl+f our oddly specific job "legacy-grenade-dsvm-cinder-mn-sub-volbak", clicking on the "builds" link changes the link in the address bar to http://zuul.openstack.org/builds.html?job_name=legacy-grenade-dsvm-cinder-mn-sub-volbak but it doesn't actually refresh the page to go to the builds for that job.	14:32
rcarrillocruz	we could team up with ansible/ansible to see if they could donate us 'some' quota for those providers	14:33
rcarrillocruz	hint hint	14:33
mordred	dmsimard: that doesn't seem awesome	14:36
mordred	dmsimard: although I need to finish the angular5 patch (one more thing outstanding) - so let's check if against that (tracking it down in the current code is likely not going to be the world's most fun thing)	14:37
dmsimard	mordred: np	14:38
mordred	dmsimard: http://logs.openstack.org/89/551989/31/check/zuul-build-dashboard/f6d6097/npm/html/builds.html?job_name=legacy-grenade-dsvm-cinder-mn-sub-volbak <-- worked on top of the angular5 patch	14:39
mordred	dmsimard: so - I think I've fixed your bug in an upcoming patch	14:39
dmsimard	mordred: going to that URL directly works	14:39
dmsimard	mordred: it's clicking on the builds link from the jobs page that doesn't, let me try	14:39
mordred	dmsimard: ya - but I got to that by following your process	14:39
mordred	http://logs.openstack.org/89/551989/31/check/zuul-build-dashboard/f6d6097/npm/html/jobs.html	14:40
dmsimard	ah, ++	14:40
dmsimard	mordred: you're so good you fix problems you didn't even know you had :)	14:40
mordred	unfortunatley I have a half-done fix for a different problem sitting on that patch locally - but I haven't touched it ina week so I don't remember what the problem was anymore	14:40
*** jimi\|ansible has quit IRC		14:42
corvus	mordred, tobiash, Shrews: i very much think that a reasonable set of popular drivers should be in-tree in order to be useful for users. and yes, they should be tested, though i'm not sure they always need to tested against live systems -- betamax/mocks/fakes may be enough in some circumstances. as core reviewers we don't need to know everything about them. we need to make a good api interface so that	14:50
corvus	people who do know about them can maintain them.	14:50
Shrews	i don't understand the reasoning that having them in-tree makes it more useful	14:51
Shrews	it may make it simpler	14:52
Shrews	email just sent, btw	14:52
*** acozine1 has joined #zuul		14:53
corvus	Shrews: yes, simpler is useful	14:53
Shrews	and i fear the "drive-by" driver contribution. we accept a new driver, but the author then disappears and doesn't maintain it	14:54
corvus	Shrews: having them in or out of tree has no impact on that. if there's no one to maintain it, it's dead either way.	14:55
corvus	Shrews: we need to be responsible for nodepool being usable and functional; it's too important for us to outsource that.	14:56
corvus	Shrews: i'm not saying all drivers need to be in-tree	14:56
corvus	Shrews: but most of the ones on the table so far should be, because they're all pretty major players.	14:57
corvus	(i'm fine with creating an out-of-tree driver interface after we have openstack/aws/k8s/... in tree)	14:58
*** gtema has joined #zuul		15:09
*** jimi\|ansible has joined #zuul		15:12
*** jimi\|ansible has joined #zuul		15:12
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	15:29
*** myoung is now known as myoung\|email-unl		15:32
*** myoung\|email-unl is now known as myoung\|emailplz		15:32
corvus	how about we merge my config changes?	15:34
corvus	i hit w+1 on the ones that lacked it	15:34
corvus	maybe we can restart openstack-infra with them today and see how they perform	15:35
clarkb	corvus: and watch cacti memory graphs	15:37
corvus	clarkb: oh, also, i agree about a release note	15:38
jimi\|ansible	mordred / Shrews : does zuul support restart sub-jobs yet? having a discussion about our current CI and figured if zuul doesn't do this yet we should probably start pestering you for it now :)	15:40
clarkb	jimi\|ansible: you are describing a feature that would let you describe to zuul it is ok for a job to run up to N times before succeeding and if eventually succeeding treat it as a success?	15:43
clarkb	or is this restart in another context?	15:43
mordred	clarkb: I think this is "recheck specific-job"	15:46
clarkb	ah	15:46
jimi\|ansible	yeah just restart a job due to transient network/etc. failures	15:48
jimi\|ansible	for example, in ansible we do integration tests across all the distros, and quite often we'll see failures on ubuntu or fedora for example due to failures in the apt/yum/dnf/whatever tests because the remote resource had an issue	15:49
jimi\|ansible	so rather than re-run the entire test suite just restart that sub-job	15:49
openstackgerrit	Merged openstack-infra/zuul master: Don't store references to secret objects from jobs https://review.openstack.org/553596	15:50
gtema	sorry for the stupid question. When I install fresh nodepool and configure static pool with 1 host, should 'nodepool list' show this node? I'm trying to install on premise zuul but struggling here. I see, that nodepool tries to login to host upon service restart, but it fails and no proper log information is available	15:55
*** jamesblonde has quit IRC		15:55
*** hashar is now known as hasharAway		15:56
clarkb	gtema: reading the code it looks like static node info isn't written into zookeeper until first use, and zookeeper's node records are where `nodepool list` output comes from	16:01
corvus	gtema: i believe it should not appear in the list. personally, i think it should, but the driver isn't implemented that way.	16:02
corvus	i'd like us to revisit that.	16:02
clarkb	corvus: ++ it appears that when the static nodes are launched() their records are written we can probably just write records for all of them on start up then launch will update status?	16:02
corvus	yeah, it seems like it should be possible	16:03
tobiash	corvus: maybe you want to rebase the timeout fix to the start of your stack to minimize rechecks ;)	16:03
corvus	i think we talked about it in review; i'm not sure why it didn't work out	16:03
*** hasharAway is now known as hashar		16:03
corvus	tobiash: yeah, now that i've incurred the cost, i'll do that :)	16:03
clarkb	gtema: as for debuggin the ssh, I don't know that nodepool actually tries to login but will ask the remote node for its ssh hostkey	16:04
clarkb	gtema: what logs do you have?	16:04
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Allow extra time for some ansible tests https://review.openstack.org/564572	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Fix race in test_bubblewrap_leak https://review.openstack.org/564640	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Perform late validation of secrets https://review.openstack.org/553041	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Perform late validation of nodesets https://review.openstack.org/553088	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Late bind projects https://review.openstack.org/553618	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Make config objects freezable https://review.openstack.org/562816	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Remove layout from ParseContext https://review.openstack.org/563695	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Remove 'base' from UnparsedAbideConfig https://review.openstack.org/563757	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Change TestMaxTimeout to not run ansible https://review.openstack.org/564562	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Store source context on all config objects https://review.openstack.org/564563	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Cache configuration objects in addition to YAML dicts https://review.openstack.org/564061	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Stop deep-copying job variables https://review.openstack.org/564564	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Remove source_context argument to Pipeline https://review.openstack.org/564642	16:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Simplify UnparsedConfig.copy https://review.openstack.org/564647	16:05
corvus	all rebase ^	16:05
clarkb	corvus: I'm assuming that gerrit will reapply our +2 in most (all) changes since the thing changing was two lines in the tests	16:05
clarkb	corvus: let me know if I need to rereview something	16:06
gtema	only switching nodepool to debug by manually changing logconfig.py normal to handler to DEBUG	16:06
corvus	(we need to make that a command line argument) ^	16:06
clarkb	gtema: can you share those logs with a paste service so that we can see what it is doing?	16:06
gtema	clarkb: and on the target host failed attempts from audit.log	16:06
gtema	clarkb: https://pastebin.com/H9CdA3Vi - nodepool.log	16:09
gtema	clarkb: immediately after restart in the /var/log/messages: https://pastebin.com/S05KiRCf	16:12
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add regex support to project stanzas https://review.openstack.org/535713	16:12
tobiash	clarkb, corvus: rebased ^ to match your stack	16:13
corvus	tobiash: thanks and sorry	16:13
tobiash	I just hit the rebase button ;)	16:14
tobiash	votes are retained	16:14
clarkb	gtema: my initial reading of the nodepool logs is that nothing is wrong, nodepool creates the records it needs then is waiting for node requests from zuul	16:14
tobiash	corvus: shall we +w this too or do you want a further review on that?	16:14
corvus	tobiash: let me sanity check it in the new context	16:14
clarkb	and I don't see whee nodepool would be logging into the remote host, it definitely does a keyscan though	16:14
tobiash	ok	16:14
clarkb	oh wait its gonna do the ready check isn't it /me digs more	16:15
tobiash	clarkb: it does a keyscan during reconfig	16:15
gtema	clarkb: ok, thanks. I was confused that nodes are not listed. Will continue zuul setup. But would those nodes listed only when tasks are executing there, or permanently after first task was executed	16:15
corvus	gtema: only when tasks are executed, i believe	16:16
gtema	clarkb: ok, thanks	16:16
clarkb	tobiash: ya I see the keyscanning. gtema best guess is that the keyscan implementation attempts to do a login to get the key(s)?	16:16
tobiash	clarkb: no, it just does a keyscan	16:17
tobiash	so it can hand them over to zuul	16:17
clarkb	there is definitely a paramiko.start_client then client.get_remote_server_key	16:17
clarkb	unsure if the start_client will attempt a login?	16:17
clarkb	or at least appear that way from audit.log's perspective	16:17
corvus	clarkb: maybe any ssh connection that doesn't end with a login is a "login failed" ?	16:18
corvus	from sshd's pov	16:18
clarkb	corvus: ya	16:18
clarkb	also no account info in that logged entry	16:18
clarkb	which lines up with I just made keyscan	16:18
corvus	gtema: so best guess is that everything's working okay, and if you continue with zuul setup so it requests a static node, it should (hopefully) work	16:19
gtema	ok, thanks	16:19
tobiash	corvus: I think I'll rebase the regex change on top of your complete stack, currently it's somewhere in the middle	16:22
corvus	tobiash: sounds good	16:22
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add allowed-triggers and allowed-reporters tenant settings https://review.openstack.org/554082	16:31
openstackgerrit	Merged openstack-infra/zuul master: Allow extra time for some ansible tests https://review.openstack.org/564572	16:44
openstackgerrit	Merged openstack-infra/zuul master: Fix race in test_bubblewrap_leak https://review.openstack.org/564640	16:45
tobiash	hrm, everything broken, I think I have to restructure the regex change	16:47
clarkb	oh ya types move and stuff	16:47
corvus	tobiash: yeah, that's what i was worried about; let me know if you need help.	16:47
tobiash	corvus: just some guidance about the way to choose, in UnparsedConfig.copy	16:48
tobiash	first choice is to keep the regex projects grouped by regex, but then it would be someting extra during copy	16:49
tobiash	or making that a list and group them by regex in tenantparser._addlayoutitem	16:50
tobiash	I'm leaning towards option 2 even if that may occur a slight performance cost	16:50
corvus	tobiash: hrm, i'm not sure i understand completely. in both options, where would you separate out the regex projects from the regular ones?	16:53
corvus	tobiash: also, while i'm thinking about it, my guess is that your main loop should go in Layout.getProjectPipelineConfig now	16:53
corvus	tobiash: maybe the thing to do is to just keep them in the project list with all the others in UnparsedConfig, but then separate them out into their own list or dict in parseConfig.	16:55
corvus	(so UnparsedConfig only has "projects" and ParsedConfig has "projects" and "projects_by_regex")	16:56
openstackgerrit	Merged openstack-infra/zuul master: Perform late validation of secrets https://review.openstack.org/553041	16:58
openstackgerrit	Merged openstack-infra/zuul master: Perform late validation of nodesets https://review.openstack.org/553088	16:58
openstackgerrit	Merged openstack-infra/zuul master: Late bind projects https://review.openstack.org/553618	16:58
tobiash	right, the unparsed config should not know about regex	16:58
tobiash	I'll try that	16:58
clarkb	I like that separation as the unparsedConfig is just raw datastructures	16:59
corvus	friendly reminder, today is a fine day to update https://etherpad.openstack.org/p/zuul-update-email	17:02
corvus	mordred: clarkb and i were just having a chat in etherpad about the fact that we probably should have added a release note about the new re2 dependency	17:07
*** jpena is now known as jpena\|off		17:07
corvus	mordred: do you know if we can retroactively add a note?	17:08
corvus	(i mean, obviously we can add it to the next release, but i mean is there a way to get it categorized under the previous one?)	17:08
*** gtema has quit IRC		17:09
corvus	i'll ask in #openstack-release	17:09
*** jimi\|ansible has quit IRC		17:13
corvus	mordred: there's a mypy error in http://logs.openstack.org/28/564628/2/check/tox-pep8/0c5268b/job-output.txt.gz	17:17
corvus	mordred: oh, i think it's correct :)	17:20
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Report git sha in status page version https://review.openstack.org/564628	17:21
*** kmalloc has joined #zuul		17:22
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add release note about re2 https://review.openstack.org/564847	17:27
corvus	clarkb: ^ apparently we can just do that :)	17:28
corvus	it's probably worth thinking about whether we want to add release notes for dependency additions though. one could argue that openstack-infra is just broken because we don't run bindep on our install. :)	17:28
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: zuul web: add admin endpoint, enqueue & autohold commands https://review.openstack.org/539004	17:34
JosefWells	Hey, zuul masters, I was wondering if any other CI systems have a similar nearest-non-failing algorithm for starting test runs, etc	18:00
clarkb	JosefWells: the only one that comes to mind is chef's thing oh what is it called. Its not open source but is zuul inspired	18:06
JosefWells	I've seen similar systems in semiconductor companies, but nothing open source till zuul	18:07
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add debug info to test_slow_start https://review.openstack.org/564857	18:11
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add regex support to project stanzas https://review.openstack.org/535713	18:14
tobiash	corvus: had to reimplement half of this but got it running now ^	18:15
JosefWells	Thanks clarkb! I'm off to play with zuul!	18:16
tobiash	corvus: 564847 results in a strange ordering of the release notes: http://logs.openstack.org/47/564847/1/check/build-sphinx-docs/a6f6b7e/html/releasenotes.html	18:21
SpamapS	JosefWells: I believe the Prow folks are thinking of doing it.	18:23
SpamapS	https://github.com/kubernetes/test-infra/tree/master/prow	18:23
SpamapS	but for now IIRC it uses a simpler "1+n" window algorithm where they try 1, and then 1+n, and that way they have a chance at landing 1 or 1+n changes.	18:24
clarkb	tobiash: looks like the tests didn't have to chagne though thats good	18:28
*** electrofelix has quit IRC		18:31
mordred	corvus: wow - mypy caught an actual thing? neat	18:31
SpamapS	#winning	18:31
tobiash	clarkb: yeah, had to reimplement almost everything except the tests ;)	18:31
SpamapS	Hm.. feature idea.. let trusted playbooks request holds.	18:35
SpamapS	It would be cool to basically be able to say "If you find XYZ in the logs, and the author doesn't have any other holds active, hold these nodes"	18:36
clarkb	SpamapS: you could implement it as a playbook/role with a secret (to talk to nodepool)	18:38
SpamapS	yeah, I also just want nodepool to have a rest API	18:39
SpamapS	so I can do exactly that	18:39
SpamapS	I need a non CLI non-shared-box UI for nodepool	18:39
*** jimi\|ansible has joined #zuul		18:39
*** jimi\|ansible has joined #zuul		18:39
*** elyezer has quit IRC		18:39
SpamapS	right now I have people logging in and sudo'ing to nodepool/zuul to make holds and clean them up	18:39
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul master: Fix description for DependentPipelineManager https://review.openstack.org/564862	18:40
clarkb	tobiash: left a couple comments but they don't apepar to be regressions so didn't -1	18:43
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul master: Fix some code description https://review.openstack.org/564862	18:44
Shrews	SpamapS: you've seen https://review.openstack.org/539004 ?	18:45
clarkb	SpamapS: I'm sure your copy the entire journald data contents is related btu is it common to not be able to debug based on logs in your env?	18:47
clarkb	(its one of the big things I push back on with openstack teams, if you can debug it from the logs then your ops can't either)	18:47
SpamapS	clarkb: people use it as a dev-on-demand service	18:49
SpamapS	write the patch, throw at wall, log in and fix wrong assumptions, repeat	18:49
SpamapS	works pretty well	18:50
SpamapS	would like this to be a first class paradigm in zuul eventually	18:50
openstackgerrit	Merged openstack-infra/zuul master: Make config objects freezable https://review.openstack.org/562816	18:51
openstackgerrit	Merged openstack-infra/zuul master: Remove layout from ParseContext https://review.openstack.org/563695	18:51
clarkb	ah so assumption is that initial pass will fail and dev will jump on to iterate	18:51
clarkb	interesting	18:51
SpamapS	sometimes	18:52
SpamapS	not always	18:52
SpamapS	just a common like, "I need to fiddle with it some"	18:52
SpamapS	and rather than having a parallel vagrant path..	18:52
SpamapS	just zuul for all	18:52
corvus	SpamapS: i don't see a problem with this in principle, but i think we'll want to explore the ux around it a bit. how would the idea of, rather than requesting it in a playbook, simply every failed job was auto-held, perhaps up to a per-author or per-tenant limit or something? could even be a limit of 1 -- so the last failed job for $author is auto-held for 24 hours.	18:54
corvus	(to be clear, i'm just brainstorming)	18:54
SpamapS	Yeah I've been wondering that too.	18:54
SpamapS	Have had similar thoughts	18:55
SpamapS	Another thought I've had is to dump an SSH key into a recheck comment.	18:55
SpamapS	Like "I'm a trusted person and I want to be able to get into the nodes if this fails"	18:55
SpamapS	recheck-with-hold	18:56
SpamapS	something like that	18:56
SpamapS	anyway.. just something I'm thinking about	18:56
SpamapS	too many ideas to get done	18:56
SpamapS	For a team of about 10 users, the current method is working fine.	18:57
SpamapS	But I can see it failing to scale quickly.	18:57
corvus	SpamapS: that's a promising idea too -- it sounds like it could have a good level of delegation there (presumably could be enabled per-pipeline)	18:57
openstackgerrit	Merged openstack-infra/zuul master: Remove 'base' from UnparsedAbideConfig https://review.openstack.org/563757	18:58
openstackgerrit	Merged openstack-infra/zuul master: Change TestMaxTimeout to not run ansible https://review.openstack.org/564562	18:58
openstackgerrit	Merged openstack-infra/zuul master: Store source context on all config objects https://review.openstack.org/564563	18:58
openstackgerrit	Merged openstack-infra/zuul master: Cache configuration objects in addition to YAML dicts https://review.openstack.org/564061	18:58
openstackgerrit	Merged openstack-infra/zuul master: Stop deep-copying job variables https://review.openstack.org/564564	18:58
openstackgerrit	Merged openstack-infra/zuul master: Remove source_context argument to Pipeline https://review.openstack.org/564642	18:58
openstackgerrit	Merged openstack-infra/zuul master: Simplify UnparsedConfig.copy https://review.openstack.org/564647	18:58
corvus	welp, that's that landed!	18:58
*** elyezer has joined #zuul		19:00
*** spsurya has quit IRC		19:01
openstackgerrit	Merged openstack-infra/zuul master: Report git sha in status page version https://review.openstack.org/564628	19:15
*** myoung\|emailplz is now known as myoung		19:18
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add regex support to project stanzas https://review.openstack.org/535713	19:35
tobiash	clarkb: adapted to your comments ^	19:36
openstackgerrit	Fatih Degirmenci proposed openstack-infra/nodepool master: Add nodepool service file for CentOS7 https://review.openstack.org/564872	19:49
corvus	tobiash: dhellman says it's a bug in reno and is unrelated to that patch. the sections can end up in any order, and in fact, i think we're seeing it in action right now with them changing on the website.	19:54
corvus	tobiash: https://storyboard.openstack.org/#!/story/2001934	19:54
corvus	so it's not related to the change to add the re2 releasenote, that should be safe to land	19:54
mordred	corvus: we've exercised reno a bit recently haven't we?	19:55
corvus	ayup	19:56
tobiash	Ah ok	19:56
pabelanger	fdegir: left a suggestion on 564872	20:01
*** CrayZee has quit IRC		20:06
fdegir	pabelanger: just looking at it	20:11
fdegir	pabelanger: i didn't get this part of the comment: it will combined both files and use the proper path for centos.	20:12
fdegir	pabelanger: when you say "both files", which files do you mean?	20:12
pabelanger	you'd install the existing nodepool-launcher.service and new nodepool-launcher.d/centos.config	20:14
fdegir	pabelanger: now i got it	20:14
pabelanger	centos.conf	20:14
fdegir	right when you responded	20:14
pabelanger	:)	20:14
fdegir	but...	20:15
fdegir	pabelanger: even though systemd seems to be happy, sudo systemctl start nodepool-launcher hangs	20:15
pabelanger	does nodepool-launcher -d work?	20:16
fdegir	that works	20:16
pabelanger	if so, you might have permissions issues	20:16
fdegir	I think we need the start command updated	20:16
pabelanger	no, I suspect you cannot create the pid file	20:16
fdegir	and modified like the one i have in centos one with -d	20:16
fdegir	pid file is there	20:16
fdegir	i was looking at the service file from softwarefactory	20:17
pabelanger	should be in /var/run/nodepool, which systemd creates with RuntimeDirectory=nodepool	20:17
pabelanger	is the nodepool-launcher process running, maybe strace	20:17
fdegir	Job for nodepool-launcher.service failed because a timeout was exceeded. See "systemctl status nodepool-launcher.service" and "journalctl -xe" for details.	20:17
fdegir	it is running	20:17
pabelanger	forking?	20:17
fdegir	yes	20:17
fdegir	if I use the one from sf with Type=simple and /usr/bin/nodepool-launcher -d	20:18
fdegir	it works	20:18
pabelanger	might need guessmainpid=no	20:18
pabelanger	and pidfile set	20:18
fdegir	let me try that one	20:18
pabelanger	I stopped testing with centos, but your likely hitting some issues with systemd and python-daemon.	20:19
pabelanger	you can also enable systemd debugs to get more info on why it is failing	20:19
pabelanger	I guess nothing in journalctl -u nodepool-launcher.service	20:20
fdegir	pabelanger: nodepool-launcher.service start operation timed out. Terminating.	20:20
fdegir	pabelanger: if you look at this one	20:20
fdegir	https://review.openstack.org/#/c/564872/1/etc/centos7/nodepool-launcher.service	20:20
fdegir	the 3 main differences are the Type, ExecStart, and PIDFile	20:21
fdegir	and that one works with no issues	20:21
fdegir	but since i don't have fedora system, I am not sure if the one i sent for centos works on fedora as well	20:22
pabelanger	right, you can use nodepool-launcher -d and type=simple but don't want that to be default	20:22
corvus	heads up that current master may be broken (we apparently have a hole in our testing)	20:22
pabelanger	you should be able to use type=forking, pidfile, execstart	20:22
pabelanger	but likey need more flags on centos	20:22
fdegir	ok	20:22
pabelanger	maybe guessmainpid=no	20:23
pabelanger	I think that will read the PIDfile for the process to watch	20:23
fdegir	tried guessmainpid and it timed our as well	20:23
fdegir	out*	20:23
pabelanger	I'd enable debugging in systemd and see what is happening	20:24
pabelanger	fdegir: but I do use nodepool-launcher -d myself and it works	20:24
pabelanger	we just want zfs to use type=forking	20:24
*** acozine1 has quit IRC		20:26
fdegir	pabelanger: yes, if i run manually things work	20:26
fdegir	pabelanger: but not as a service	20:26
pabelanger	that to mean sounds like permission issue for selinux issue	20:27
pabelanger	might want to check audit logs	20:27
pabelanger	or set selinux to passive for nwo	20:27
pabelanger	now*	20:27
fdegir	sorry, didn't help	20:28
fdegir	the thing is	20:28
fdegir	when i issue systemctl start, i see the process	20:28
fdegir	the pid is in pidfile	20:28
fdegir	the nodepool reporting 018-04-27 20:27:33,673 INFO nodepool.NodePool: Starting PoolWorker.static-vms-main	20:28
fdegir	so everything seems to be working but the systemctl start doesn't seem proceed further, keeps waiting and finally timing out	20:29
pabelanger	does process die too?	20:29
fdegir	yes	20:30
pabelanger	yah, likely python-daemon cannot start properly. Check permissions on all folders, eg: /var/log/nodepool, etc	20:30
pabelanger	/etc/nodepool	20:31
pabelanger	if you sudo su nodepool	20:31
pabelanger	then run nodepool-launcher	20:31
pabelanger	it also likey fails	20:31
pabelanger	which common cause is permissions issue	20:31
pabelanger	and because python-daemon as stderr=none, you don't see failure	20:31
clarkb	(because proper unix daemonization says you should close all open fds)	20:36
pabelanger	yah, wonder if we need a --noop / --dry-run, or script to validate proper permissions on folders so daemon can properly start. Pretty hard for a new user to nodepool to understand what is happening when not using -d	20:37
corvus	pabelanger: https://review.openstack.org/547889	20:38
pabelanger	yay!	20:39
corvus	if ianw is busy, maybe someone else can port that to zuul	20:39
*** ssbarnea_ has quit IRC		20:43
fdegir	pabelanger: this is what i get with systemd debugging	20:50
fdegir	pabelanger: https://hastebin.com/ofidunewiw.sql	20:50
clarkb	fdegir: pabelanger I think that is telling us we set the type to forking but the fork parent never exited (we know it did fork though beacuse the child is mentioned in the log)	20:53
fdegir	again, all the permissions are right	20:57
fdegir	i can start things manually	20:57
fdegir	with systemctl start, i see	20:57
fdegir	cat /var/run/nodepool/nodepool.pid	20:57
fdegir	20732	20:57
fdegir	nodepool 20732 1 4 20:56 ? 00:00:01 /usr/bin/python3.5 /usr/bin/nodepool-launcher	20:57
fdegir	while systemctl start is waiting	20:58
clarkb	ya rereading docs the parent isn't exiting	20:58
fdegir	and then the stuff you see in log happens	20:58
fungi	could it be blocking on additional (higher-numbered) file descriptors inherited from the shell or something? i've never looked to see whether that daemon library is smart enough to iterate over all bound fds	21:00
fdegir	few weeks ago when i tried is on fedora, it worked	21:00
fdegir	so this seems to be centos thingy	21:00
fungi	some naive daemonization routines just assume closing stdin, stdout and stderr is sufficient	21:00
fdegir	and seeing sf using simple made me think they have a reason to use simple	21:01
fdegir	they might have faced similar issue	21:01
clarkb	fungi: systemd says it waits for parent to exit	21:02
clarkb	https://pagure.io/python-daemon/blob/master/f/daemon/daemon.py#_812 is how the library decides to detach or not by default	21:02
clarkb	so oddly I think that means we don't want type = forking or we want to set detach process flag to true	21:03
clarkb	this feels like an optimization for systemd	21:03
clarkb	pabelanger: ^ does forking work for you? I Think you said you had tested it on fedora at least	21:04
clarkb	fdegir: try it without the -d and type simple	21:05
pabelanger	clarkb: I can test quickly	21:05
pabelanger	I haven't yet	21:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Coerce MappingProxyTypes in job vars to dicts https://review.openstack.org/564886	21:05
*** harlowja has joined #zuul		21:07
pabelanger	okay, I don't think we tested this on fedora, it is also hanging for me	21:08
pabelanger	let me try something	21:08
fdegir	clarkb: it works	21:08
fdegir	clarkb: i mean without -d and type simple	21:08
pabelanger	clarkb: fdegir: that was the fix, detach_process=True	21:19
fdegir	pabelanger: so forking didn't work on fedora as well?	21:20
pabelanger	only after I patched nodepool/cmd/__init__.py	21:21
pabelanger	I've been using simple and -d myself	21:22
pabelanger	so, if we want to support forking, we'll need to patch nodepool / zuul	21:22
pabelanger	however, having issue with pidfile	21:22
corvus	pabelanger: the zfs docs should work on fedora, are you saying they don't?	21:53
pabelanger	corvus: I was testing with nodepool-builder, let me try nodepool-launcher	21:55
fdegir	i just tried again now and it didn't work	21:56
fdegir	on fedora27	21:56
fdegir	same timeout occurs there too	21:56
fdegir	Apr 27 21:44:06 fedora.localdomain systemd[1]: nodepool-launcher.service: Start operation timed out. Terminating.	21:56
corvus	are the service files that ended up in the repo the same ones from the earlier version of the docs?	21:56
fdegir	i used the one from nodepool repo	21:57
fdegir	oh	21:57
fdegir	corvus: i just looked at leifmadsen's gist	21:57
fdegir	clarkb: and that gist has simple there so the one in nodepool repo doesn't match to that	21:58
fdegir	corvus: ^	21:58
pabelanger	yah, nodepool-launcher and forking isn't working. I'm not sure anybody actually tested it	21:58
corvus	gist?	21:58
fdegir	https://gist.github.com/leifmadsen/93b9283d10dfddba096e32fb172cf569	21:58
pabelanger	it is failing on fedora for me	21:58
corvus	fdegir: oh, that's ... rather out of date :)	21:58
fdegir	because i was 100% sure it worked on fedora for me when he was working with the first version	21:58
fdegir	but the service file contains simple	21:58
corvus	fdegir: this is the most up to date thing, which is derived from that: https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html	21:59
fdegir	so if that part of nodepool hasn't changed then the service file that ended up in nodepool wasn't the correct one	21:59
fdegir	corvus: yes	21:59
*** elyezer has quit IRC		21:59
fdegir	corvus: the "official" one points to service files from nodepool repo	21:59
fdegir	corvus: and that's what i've been working on for centos docs	21:59
corvus	okay, that should work for fedora	22:00
fdegir	corvus: when you said if the right service files ended up in repo then i checked gist	22:00
fdegir	corvus: it doesn't	22:00
corvus	fdegir: oh, i meant the ones from a previous version of the docs, but later than the gist	22:00
fdegir	the official one doesn't work	22:00
corvus	fdegir: to be clear: you're saying if i follow the instructions in https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html it won't work?	22:00
fdegir	yes, that's what i am saying	22:01
fdegir	the service file the doc tells user to copy from nodepool repo is the problem	22:01
fdegir	https://zuul-ci.org/docs/zuul/admin/nodepool_install.html	22:01
fdegir	sudo cp etc/nodepool-launcher.service /etc/systemd/system/nodepool-launcher.service	22:01
fdegir	this service file has forking in it	22:01
corvus	okay, that's a problem for which i will drop everything and run through the instructions again	22:01
clarkb	corvus: the issue is https://pagure.io/python-daemon/blob/master/f/daemon/daemon.py#_812	22:02
fdegir	i think the easiest fix is to switch to simple instead	22:02
fdegir	until nodepool/zuul is patched according to what clarkb just pasted	22:02
clarkb	corvus: fdegir we can either decide to use simple and allow default behavior from ^ or override the default behavior and fork twice	22:02
pabelanger	okay	22:02
pabelanger	the issue is type=forking	22:02
clarkb	sort of	22:03
pabelanger	switching back to type=simple, the pidfile is created properly	22:03
pabelanger	and systemd starts properly	22:03
pabelanger	however, I don't think that is the right way systemd wants the process to work	22:03
pabelanger	we'd need the setting clarkb said above for type=forking I think	22:03
*** rlandy has quit IRC		22:04
clarkb	right forking is fine if you fork. and simple is fine if you don't fork. Just have to decide which we want	22:04
corvus	http://git.zuul-ci.org/cgit/zuul/commit/doc/source/admin/zuul-from-scratch.rst?id=28d99222a6cb82aaf7698571359363be6416b38f	22:04
openstackgerrit	Merged openstack-infra/zuul master: Coerce MappingProxyTypes in job vars to dicts https://review.openstack.org/564886	22:04
fdegir	same problem probably exists for zuul-{scheduler, executor} as well since those service files use type=forking too	22:04
corvus	the service file that was added to nodepool was not the one that was in the docs	22:04
pabelanger	nope, I lied type=simple doesn't work	22:04
pabelanger	it was killed after x seconds	22:05
corvus	Shrews: ^	22:05
clarkb	pabelanger: ok that at least makes me think we didn't do something completely wrong in investigating the forking option	22:05
corvus	pabelanger, fdegir: have you tried the version in http://git.zuul-ci.org/cgit/zuul/commit/doc/source/admin/zuul-from-scratch.rst?id=28d99222a6cb82aaf7698571359363be6416b38f ?	22:06
corvus	pabelanger, fdegir: specifically at http://git.zuul-ci.org/cgit/zuul/tree/doc/source/admin/zuul-from-scratch.rst?id=38b26de3b398e1ee1fa2bcbed0a6bc5105589f67#n254	22:06
pabelanger	clarkb: yah, enabling detach_process=True is what gets type=forking working	22:06
pabelanger	corvus: testing	22:06
*** elyezer has joined #zuul		22:08
clarkb	what is odd about fdegir's log is that it seems to indicate there is a child	22:09
clarkb	but the only os.fork happens if detach_process=True	22:09
fdegir	corvus: that seems to work	22:11
pabelanger	confirmed	22:11
fdegir	corvus: it's still alive	22:11
corvus	pabelanger: can you please propose that as a patch. can you also please verify that the zuul service files are the same ones from that version of the documentation?	22:12
pabelanger	but, I don't think systemd will ever use the pid file we are creating as PIDfile is only used with forking	22:12
pabelanger	corvus: sure	22:12
fdegir	pabelanger: can you add me to those changes as reviewer so i can continue with centos instructions based on those?	22:13
corvus	pabelanger, Shrews, tobiash: i'd like us to be very careful with the zuul-from-scratch document. when we make changes, we need someone to actually do the process manually and verify that it works.	22:13
corvus	what happened here is that after i spent several days running through the document and verified everything in it, we made changes based on things that people thought "should work". let's not do that again.	22:14
corvus	so please at least get a review comment from someone -- the author or a reviewer -- that says "i tested this and it works"	22:14
pabelanger	yah, I left a +2 with I have not tested, I should have really done a +1	22:15
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool master: Fix nodepool-launcher systemd file https://review.openstack.org/564901	22:18
pabelanger	corvus: fdegir: clarkb: ^ that is working systemd file	22:18
pabelanger	for nodepool	22:19
pabelanger	I'll test zuul over the weekend	22:19
fdegir	tried it and it works	22:20
fdegir	thanks all for the help	22:22
fdegir	now i can go back to where i left things	22:22
corvus	fdegir: it looks like the same problem exists for the zuul service files	22:22
corvus	fdegir: you can get a good version of those from the doc i linked earlier until we fix it	22:23
corvus	pabelanger: it looks like you asked Shrews to make the same erroneous changes to the service files in zuul, can we go ahead and fix that now?	22:23
fdegir	corvus: will look for those patches as well and base the work on it	22:23
pabelanger	corvus: yes, I won't be able to test them until later however	22:24
corvus	pabelanger: as long as they match the version i confirmed was working earlier, i'm happy. they're certainly broken now.	22:24
corvus	clarkb: can you approve https://review.openstack.org/564901 ?	22:26
mordred	corvus: I +2d - want me to wait on clarkb or just +A?	22:26
corvus	mordred: +a	22:27
mordred	corvus: done	22:27
fdegir	would you like me to send those since with the new nodepool service files I am now moving to zuul steps	22:27
clarkb	sorry finally getting to lunch now	22:27
fdegir	and can verify those service files from the earlier version of the doc and send the change	22:27
corvus	fdegir: i think pabelanger is about to do that in just a few mins	22:28
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Fix zuul systemd files https://review.openstack.org/564903	22:28
fdegir	ok	22:28
corvus	seconds even	22:28
fdegir	:)	22:28
pabelanger	revert, but untested	22:28
pabelanger	(by me)	22:28
corvus	they match the ones i tested	22:29
*** hashar has quit IRC		23:19
openstackgerrit	Fatih Degirmenci proposed openstack-infra/zuul master: Add CentOS 7 environment setup instructions https://review.openstack.org/564948	23:24
openstackgerrit	Fatih Degirmenci proposed openstack-infra/zuul master: Add CentOS 7 environment setup instructions https://review.openstack.org/564948	23:26
openstackgerrit	Fatih Degirmenci proposed openstack-infra/nodepool master: Add systemd drop-in file for CentOS 7 https://review.openstack.org/564872	23:41
openstackgerrit	Fatih Degirmenci proposed openstack-infra/zuul master: Add steps to use systemd drop-in for Nodepool on CentOS 7 https://review.openstack.org/564950	23:47

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!