Tuesday, 2017-11-14

leifmadsen	please note there is a lot of precursor "quickstart" notes here: https://etherpad.openstack.org/p/zuulv3-quickstart	00:00
leifmadsen	SpamapS: ^^	00:02
leifmadsen	I need to get a couple small things out of the way, but then a "omg I just want to run some basic Zuul" is very high on my list, primarily because as every day passes, I realize I need it more and more	00:02
leifmadsen	Next step in my notes, is mostly just "run a job". Based on some examples I've seen, I think that's going to be relatively straight forward. Once that works, then it's basically me turning the notes into some documentation, which I'm not too worried about, as I've done a lot of that part.	00:03
tristanC	leifmadsen: hey, fwiw we are about to release software-factory 2.7, and you could get a running v3 setup with a base jobs already configured with a logserver in 3 commands	00:08
leifmadsen	yea... not really what I'm looking for though	00:08
leifmadsen	I understand everyone has push button infra now :)	00:08
tristanC	"omg I just want to run some basic Zuul" sounds like what you need is actually a push button infra now :)	00:11
leifmadsen	does it do GitHub integration too?	00:11
leifmadsen	not looking to run gerrit	00:12
tristanC	if you add github apitoken and webhook secret to the sfconfig.yaml, sf will add github connections and pipeline to zuul config	00:13
leifmadsen	I just think SF is going to have far too many moving parts for what I really want to explore	00:13
leifmadsen	also, that doesn't help the "document zuul" effort :)	00:13
leifmadsen	so it's more than just "run zuul"	00:13
leifmadsen	otherwise, I'd look more at Windmill, or Hoist, or XYZ	00:14
SpamapS	tristanC: I'm very excited that you are doing that. :)	00:15
SpamapS	I wanted to use SF	00:15
SpamapS	but it was still too weird for me so I just fell back on BonnyCI/hoist	00:16
SpamapS	leifmadsen: I've got it on my list of things to work on tonight, where I'll have some quiet time in a hotel room. :)	00:16
SpamapS	I have written up the entire bootstrap procedure in a GoDaddy context....	00:16
leifmadsen	SpamapS: cool, well I'd encourage you to go through what I've gotten so far for sure	00:16
leifmadsen	the idea being to run a single VM, that gets events from GitHub, and trigger it to run a "Hello world" ansible playbook	00:17
SpamapS	Just need to genericise the parts that are like "Go to the system that allocates service accounts in AD and get one in group Z" and make those "you'll need a user account on your cloud that can do A, B, C"	00:17
leifmadsen	I gotta go put the kids to bed, or I'd elaborate more, but had a talk with mordred and jeblair before we started, so they know the general approach	00:17
leifmadsen	SpamapS: that's exactly what I'm trying to avoid :)	00:17
pabelanger	much backscroll	00:18
SpamapS	pabelanger: most of it is just wanking from me. ;)	00:21
leifmadsen	SpamapS: so I didn't read the whole scrollback, but what confuses me, is why master == 2.0, and feature/zuulv3 exists at all :) I'd actually have almost thought it'd be the other way around, and there be stable/2.0 and master (future 3.0)	00:23
leifmadsen	I'm sure it's been asked and answered a 1000x though	00:24
pabelanger	leifmadsen: FWIW: at summit, we were really close to getting hello world job going on zuulv3 for hands-on workshop. We were lacking an openstack cloud to run jobs on a node. I'm hoping next time we can demo, something like OCI nodepool driver will be finished, or I'll ask openstack passport program to offer up some cloud resources	00:26
mordred	pabelanger: or both!	00:26
leifmadsen	pabelanger: didn't have an RDO Cloud login?	00:27
leifmadsen	that's what I've been using anyways	00:27
pabelanger	leifmadsen: we had 20 users running their own zuulv3, so the idea was each would have their own cloud creds	00:27
pabelanger	mordred: ++	00:27
leifmadsen	gotcha	00:27
mordred	pabelanger: one of the things that stood out to me doing the walkthrough with leif earlier is that a getting started guide that can use OCI or static or something similar as a step one, with "now you can plumb in your clouds" as step two feels like a nice incremental approach - once we can do that	00:27
pabelanger	leifmadsen: also, might be interested in https://git.openstack.org/cgit/openstack-infra/publications/commit/?id=4f0a375f966171a81be4a1c76983c71359e037a1 for example playbooks for jobs. I did that for my JJB to ansible playbooks talks	00:28
leifmadsen	someone (SpamapS?) mentioned the other day that I could just run against the executor itself too	00:28
pabelanger	mordred: right, I think that is inline with what we talked about too	00:28
mordred	"here's how you can get a zuul on your laptop that will run content all on your laptop with no clouds" - then "here's how you can add clouds" and then "here's how you can add managed base images" ...	00:28
leifmadsen	pabelanger: cool, might be useful for TOAD I guess	00:28
SpamapS	pabelanger: did you just do your hello world job without a nodeset?	00:28
leifmadsen	mordred: +∞	00:29
SpamapS	Because I was just poking at doing a few dumb jobs on the executor today.	00:29
SpamapS	Like, I have a few that just validate YAML	00:29
pabelanger	SpamapS: we didn't just noops, due to time. The plan was to use a trusted playbook on executor	00:29
SpamapS	don't need to install anything, just some python. Don't need a node for that. :)	00:29
leifmadsen	I probably need to back up at some point and just drop nodepool entirely in the quickstart	00:29
pabelanger	leifmadsen: yah, that's what we did. I was hoping to demo with RDO cloud on my laptop, but ran out of time	00:30
clarkb	mordred: static to localhost would be super easy	00:30
pabelanger	90mins goes fast!	00:30
clarkb	mordred: and not require any additional estup or software	00:30
pabelanger	i should check if devconf.cz has a hands-on workshop session too	00:32
leifmadsen	also, I've been wanting to run this on Fedora instead of Ubuntu	00:32
leifmadsen	Fedora has gone well. Pretty sure I'll abandon CentOS for now.	00:32
pabelanger	yah, I've dropped centos for now	00:32
leifmadsen	maybe add some sidebar stuff later, but automating on Fedora is much easier	00:32
pabelanger	but fedora works great	00:32
leifmadsen	I mostly have zero interest in Ubuntu stuff	00:32
clarkb	the only difference I would expect between the two is bubblewrap	00:33
clarkb	everything else should be fairly transparent	00:33
clarkb	virtualenv, run process, win	00:33
pabelanger	clarkb: yah, having fedora shipping bwrap is nice. We need to get our bwrap into backports for xenial	00:34
pabelanger	also, devconf.cz does have working groups :) We should totally do an installfest for zuulv3	00:34
clarkb	bwrap works great on tumbleweed too	00:34
pabelanger	clarkb: when are we getting a DIB :D	00:35
clarkb	pabelanger: dirk says its up for review	00:35
pabelanger	cool	00:35
clarkb	its on my list of things to do now that I'm home	00:35
clarkb	(review the change)	00:36
pabelanger	okay, submitted JJB to ansible talk again to devconf.cz	00:48
*** jkilpatr has quit IRC		01:05
SpamapS	My Zuul runs on CentOS 7	03:24
SpamapS	with bubblewrap from rawhide	03:24
SpamapS	pabelanger: ahh.. a week in snowy Brno. ;)	03:32
pabelanger	:)	03:33
SpamapS	ahhhh.. sweet sweet 1st class upgrade (even on a 60 minute flight.. so good)	04:16
pabelanger	Where did we land on the tox with sudo job, did that ever get resolved?	04:34
SpamapS	interesting	07:13
SpamapS	I tried to make an executor-only job that runs a ruby program,but ruby does not like running in the bwrap	07:13
SpamapS	http://paste.openstack.org/show/626236/	07:13
* SpamapS heads to bed to ponder this in dreamland		07:14
*** xinliang has quit IRC		07:21
*** xinliang has joined #zuul		07:33
*** xinliang has quit IRC		07:33
*** xinliang has joined #zuul		07:33
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add role to build Puppet module https://review.openstack.org/519489	07:56
*** bhavik has joined #zuul		08:42
openstackgerrit	Rui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582	08:47
openstackgerrit	Rui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582	08:57
*** bhavik has quit IRC		09:23
*** jianghuaw has quit IRC		09:26
*** rbergeron has quit IRC		09:34
*** rbergeron has joined #zuul		09:34
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Allow run to be list of playbooks https://review.openstack.org/519596	09:53
pabelanger	fun review is anybody else is a wake	09:53
tobiash	pabelanger: this looks ok to me but I didn't fully understand your use case	10:05
tobiash	the use case described in the commit message should also be possible with one playbook containing multiple plays	10:06
tobiash	or did I overlook something?	10:06
openstackgerrit	Akihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test https://review.openstack.org/518879	11:27
openstackgerrit	Akihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test https://review.openstack.org/518879	11:31
odyssey4me	Shrews is the request/response protocol documented anywhere? if that's the API, I'd rather be using it...	11:46
*** tobiash has quit IRC		12:04
*** tobiash has joined #zuul		12:06
*** tobiash has quit IRC		12:06
*** tobiash has joined #zuul		12:07
*** jkilpatr has joined #zuul		12:42
Shrews	odyssey4me: it's described in http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html but there is no proper documentation of the protocol itself	12:47
Shrews	something we should strive to correct	12:47
Shrews	but it's sort of fluid right now until we get a proper release	12:48
odyssey4me	Shrews ah, ok - I think we can work that in... it definitely makes sense to... but yeah, using the same protocol as zuul for requests to nodepool makes a lot of sense	12:50
Shrews	odyssey4me: this is the class zuul uses for the requests: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3#n526	12:51
Shrews	odyssey4me: and this is what nodepool uses for what it expects from a request and uses for fulfillment: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n348	12:52
Shrews	the zuul class should be at least a subset of the nodepool class	12:53
odyssey4me	thanks Shrews - I'll be getting back to that in a week or two. I've unfortunately got to context switch right now to something else. :/	12:55
mordred	odyssey4me: darned context switching	12:57
odyssey4me	yup, comes with the territory...	12:57
mordred	++	12:57
mordred	pabelanger: I agree with tobiash - the patch looks fine but I don't fully understand the words from the commit message	12:59
mordred	pabelanger: the foo and bar playbooks you have in the test could totally be two different plays in the same playbook ... that said, I don't see a reason why run shouldn't be able to take a list	13:00
mordred	pabelanger, tobiash: also, I think we should sit on that one until jeblair gets back - I'm not sure if there was an active reason run was a single playbook and not a list	13:04
openstackgerrit	Akihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test https://review.openstack.org/518879	13:19
rcarrillocruz	mordred, pabelanger : are we good to +A https://review.openstack.org/#/c/453968/11	13:33
* SpamapS still trying to figure out bubblewrap + ruby fail :-P		13:35
rcarrillocruz	Shrews: pushed revision on https://review.openstack.org/#/c/500800/5 last night, pls have a look when get a sec	13:38
mordred	SpamapS: oh good luck with that :)	13:38
SpamapS	interesting...	13:42
SpamapS	so on the executor, if I rw mount /var/lib/zuul into the bwrap, ruby works fine	13:42
SpamapS	suggesting that ruby uses $HOME weirdly	13:42
SpamapS	yep, that's it	13:43
SpamapS	well that at least simplifies things. :)	13:43
SpamapS	I wonder if we should set $HOME in the bubblewrap driver.	13:43
SpamapS	we change it in /etc/passwd	13:43
tobiash	SpamapS: in this case we should probably	13:44
SpamapS	Yeah, easy patch, and I can't see the harm in it.	13:46
SpamapS	/var/lib/zuul is totally inaccessible otherwise.	13:46
mordred	SpamapS: yah - I think our intent is to make the workdir appear as $HOME - so if we also need to set the variable to get that to happen, seems sane to me	13:48
mordred	SpamapS: setting $HOME should avoid the need to rw mount /var/lib/zuul if I'm reading you right, yeah?	13:49
SpamapS	mordred: yeah, and mounting /var/lib/zuul would be counter to the bwrap mission in this case. :)	13:50
SpamapS	(Or we could bind mount work_dir on top of $HOME .. but I kind of dislike that)	13:50
SpamapS	we rewrote it in /etc/passwd, I think rewriting it in the environment makes a lot of sense.	13:51
SpamapS	mordred: my intention here is to test out running nodeless jobs that do very little.	13:51
SpamapS	this one just runs a silly markdown linter written in ruby	13:51
SpamapS	So I'm thinking, just install that script on the executors and let it run on localhost. Also means we can skip most of the stuff I have in my usual base job that verifies nodes and pushes source.	13:53
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Override HOME environment variable in bubblewrap https://review.openstack.org/519654	13:57
*** hashar has joined #zuul		13:58
*** hashar has quit IRC		14:00
*** jianghuaw_ has joined #zuul		14:03
*** dmsimard\|off is now known as dmsimard		14:05
SpamapS	Ugh, cancelling a job that is being watched ... did not go well	14:13
SpamapS	http://paste.openstack.org/show/626270/	14:13
mordred	SpamapS: I've got some refactoring of that stack on my TDL for this week	14:14
SpamapS	I think I haven't pulled in a while too	14:14
SpamapS	Been avoiding pulling until I can CI my CI	14:14
SpamapS	mordred: if you can make stream.html just retry over and over if it gets an empty log.. that would be great. I constantly click it about 1s before output has started.	14:15
SpamapS	hitting cmd-R is..hard?	14:15
leifmadsen	SpamapS: ruby? doing something weird? say it ain't so!	14:21
tobiash	SpamapS: I thought I had a fix for that	14:22
leifmadsen	SpamapS: but once you have CI for your CI, how will you test your tester?	14:23
* leifmadsen ducks		14:23
leifmadsen	ok, so I think I got far enough that I have a "hello world" job and a dummy base job to load it from. How do I go about ignoring the nodepool stuff for now and running directly on the executor? I'm not able to run the job right now because Zuul complains that it doesn't have permission to ssh-add /root/.ssh/id_rsa. The seems like a simple enough configuration problem, but I think it only does that because it is	14:24
leifmadsen	trying to connect to the nodepool nodes?	14:24
tobiash	SpamapS: https://review.openstack.org/#/c/514617/ that should have fixed the empty log	14:25
tobiash	(if this was the only cause)	14:25
SpamapS	leifmadsen: when I have CI for my CI I will test my tester by CD'ing my CI'd CI.	14:30
SpamapS	leifmadsen: also, just using a markdown linter.	14:30
SpamapS	If there's a better one in python, I'll use that. :)	14:30
leifmadsen	:buddy_jesus_thumbs_up:	14:30
SpamapS	tobiash: ah yah, just need to restart zuul-executor to get that one. :)	14:32
tobiash	:)	14:32
SpamapS	Oh right.. can't use command: on localhost on untrusted jobs. Well poo. I just need a container thingy then.. trusted jobs are a pain.	14:38
* SpamapS goes back to running it on a tiny VM		14:39
kklimonda	I’d like to use zuul pipeline and dependent jobs to introduce “checkpoints” (so that zuul skips part of the pipeline that has succeeded on the previous failure and restarts from the first failed job) - i don’t think that’s possible now, but are there any potential risks in the scheduler I should be aware of before I start looking at the code?	14:42
SpamapS	kklimonda: that would require keeping state somewhere.	14:43
SpamapS	kklimonda: but, what restart are you thinking that you want to avoid re-testing?	14:44
kklimonda	good point, that’s correct :)	14:44
kklimonda	in the gate pipeline we’ll be building packages, then docker containers and then running integration testing	14:45
kklimonda	That’s 50 minutes for packaging, another 30+ (probably more) to get containers ready	14:46
kklimonda	All this to rerun a flaky test	14:46
kklimonda	and we do that for 4 different distros, and different OS flavors	14:48
kklimonda	A failure is costly	14:48
kklimonda	So now in our zuul2 setup our jobs check for the existence of artifacts and return early	14:48
kklimonda	But it would be nice to make zuul aware of that so it’s not reporting jobs as taking 30 seconds	14:49
kklimonda	I have thought of adding a different job return status (EXIT_EARLY) to indicate that the job didn’t run (I’m not sure about overloading SKIPPED) but making zuul more aware of the pipeline status would be even nicer	14:51
kklimonda	Obviously, after your comment that’s not something b	14:51
kklimonda	That can be done relatively easy*	14:52
*** jkilpatr has quit IRC		14:53
SpamapS	kklimonda: I think the way you're doing it is pretty nice actually. Your artifact repository is keeping state for you.	14:54
SpamapS	kklimonda: I'd be concerned about missing changes though. What ID do you use to store/fetch them?	14:55
kklimonda	hmm, right now it’s a tulle of (change, patchset, job name)	14:56
kklimonda	Tuple*	14:56
kklimonda	I believe, I’ll double check when I get to the computer - what should I keep in mind ?	14:57
SpamapS	So, that's going to break if you have a long dependent pipeline.	15:00
SpamapS	If you're single-repo, it's fine.	15:00
SpamapS	But if you have 2 repos in there.. those won't change, but the parent may.	15:00
SpamapS	I've struggled with this a lot with Zuul actually. Need something that stays with the build from the first moment through to after the merge.	15:01
leifmadsen	ok that's weird... I killed the zuul-executor, and restarted it, and now it just dies in the background...	15:01
leifmadsen	oh wait, I know why	15:03
* leifmadsen facepalms		15:03
leifmadsen	I killed it, so there was no removal of the /var/run/ file	15:03
kklimonda	SpamapS: ha, interesting point - that makes it all a non-starter basically. Thanks	15:07
kklimonda	right now we are not utilizing dependent pipelines and cross-repo dependencies but that's one of the requirements for the newer system.	15:08
kklimonda	but when you think about it, right now it's also broken	15:10
kklimonda	well, we can always have a periodic job that will catch anything that slips through cracks ¯\_(ツ)_/¯	15:24
kklimonda	I think I'll need a drink	15:24
*** jkilpatr has joined #zuul		15:58
*** hashar has joined #zuul		15:59
rbergeron	spamaps: I believe i have located the magical karaoke location nearby-ish if you haven't identified a place yet :)	16:00
*** jkilpatr has quit IRC		16:21
*** jkilpatr has joined #zuul		16:29
*** jkilpatr has quit IRC		16:46
pabelanger	tobiash: mordred: yah, I figured commit message would need more details, happy to explain. Today, this is the inventory file I use, http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/inventory along with the entry point for ansible-playbook: http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/site.yaml. Because the way ansible loads includes, if I setup a single playbook run, with	17:04
pabelanger	https://review.openstack.org/#/c/519596/1/tests/fixtures/config/ansible/git/common-config/zuul.yaml as nodesets, this means a single SSH connection will be used for ansible-playbook runs. Which messes up my variable include structure, the playbooks were written to either be run on a single host or inventory file with multiple different hosts. I can use the following nodeset also,	17:04
pabelanger	https://review.openstack.org/#/c/519539/16/.zuul.d/jobs.yaml but means I need to consume 6 nodes in nodepool. I am hoping ansible 2.4 might fix some of the variable issues, but switching to include_plabooks vs include	17:04
pabelanger	So, happy for feedback / suggestions, but in my testing last night, this seems to be the only way to get ansible playbooks to work in zuulv3 from executor	17:06
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Add username to build and upload information https://review.openstack.org/453968	17:12
*** clarkb has quit IRC		17:14
Shrews	rcarrillocruz: i believe your last patch set on 453968 to fix the tests that you self approved exposes an issue, or at least an inconsistency	17:16
Shrews	rcarrillocruz: i'm going to put up a fix	17:16
*** hashar is now known as hasharAway		17:17
*** clarkb has joined #zuul		17:22
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Be consistent with the ZK data model https://review.openstack.org/519706	17:27
Shrews	rcarrillocruz: ^^^	17:27
SpamapS	rbergeron: woot!	18:13
rbergeron	spamaps: how goes your kool-aid drinking? is this your official orientation stuff?	18:21
rbergeron	or was that in sunnyvale a while ago	18:21
*** jkilpatr has joined #zuul		18:30
mordred	pabelanger: how would you split site.yaml into multiple playbooks otherwise? or would you basically put each of the playbooks listed in your site.yaml in the run list?	18:55
pabelanger	mordred: yah, each in the run list for now is what I was thinking. The other option, is to show how allow use to generate an inventory file like: http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/inventory but have same IP for ansible_host setting of each node. Which, might be something good to support regardless.	18:59
pabelanger	will be single node in nodepool, but allow ansible to think it is multiple hosts	19:00
mordred	pabelanger: you can do that already ...	19:01
mordred	pabelanger: just make a nodeset witha single node and that node assigned to multiple groups	19:01
pabelanger	mordred: like https://review.openstack.org/#/c/519539/14/.zuul.d/jobs.yaml ?	19:03
mordred	pabelanger: yup!	19:06
pabelanger	mordred: right, so that does not work as I would expect, as it related to variable scoping. I think because ansible only uses a single SSH connection, variables are not reset between play runs as I would expect with running ansible-playbook multiple times	19:07
mordred	pabelanger: gotcha	19:07
pabelanger	mordred: the good news is, using 5 nodes in nodepool, playbooks work as expected	19:09
mordred	\o/	19:10
Shrews	converting legacy devstack jobs to use the new native parent job is not as well documented as one would like	19:49
Shrews	something better than zero would be nice	19:50
dmsimard	ianw: so what we do for testing zuul callbacks right now is to set up a "nested" zuul in a multinode job but we're just running ansible-playbook, we're not actually exercising the executor	20:00
dmsimard	ianw: nodeset and job is here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/.zuul.yaml?h=feature/zuulv3#n16	20:01
dmsimard	I don't know to what extent we could use this approach to exercise base jobs or trusted playbooks from an executor's POV	20:01
dmsimard	mordred, jeblair: has there been any new ideas regarding how to test config-repo/trusted execution ? The fact that these aren't integration tested still bothers me very much.	20:02
dmsimard	I haven't come up with ideas that don't involved somehow setting up a nested zuul and, like, enqueuing a job manually.. but then that also requires an openstack cloud, ugh	20:03
mordred	dmsimard: once we land static node support, we should be able to just make a two-node job that installs zuul on one, registers the othre as a static node and then runs a job	20:07
SpamapS	rbergeron: yes, kool-aid drinking happening	20:07
ianw	mordred: are there changes out there for that already?	20:10
ianw	just thinking that maybe the nodepool dsvm jobs maybe aren't that far off having a zuul added to them	20:13
mordred	ianw: https://review.openstack.org/#/c/468624/	20:16
dmsimard	mordred: so then what ? the nested zuul has to load all the 2000 repository worth of configuration before being able to run and stuff ?	20:19
mordred	dmsimard: nah - we'd just write out a config file with only a few repos listed I'd guess - or potentially even just use the git driver and point it at repos on the local disk waves hands	20:20
mordred	dmsimard: I mean, I don't know the full answer yet, but I know we should at least be able to do something once static node is there without needing an openstack :)	20:21
* dmsimard nods		20:21
dmsimard	progress	20:21
mordred	yah.	20:21
mordred	baby steps	20:21
pabelanger	I'd be totally interested on how we'd trigger a job run in zuul for that	20:22
*** hasharAway has quit IRC		20:23
mordred	pabelanger: magic?	20:23
pabelanger	maybe fedmsg :)	20:23
mordred	:)	20:23
mordred	pabelanger: fedmsg is becoming the new AFS ... it's the answer we roll out for all the questions :)	20:24
pabelanger	indeed!	20:24
dmsimard	pabelanger: I was thinking just ... "zuul enqueue thing thing"	20:37
dmsimard	Shrews: seeing NodeExists errors in our zookeeper.. which apparently only exists here: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n1308	20:54
dmsimard	Shrews: if I stop the (only) launcher we have, I stop seeing the error in the zk logs.. but they resume as soon as I start nodepool. I tried nothing and I'm all out of ideas.	20:55
dmsimard	for example: http://paste.openstack.org/raw/626294/	20:56
Shrews	dmsimard: you tried nothing? lol	20:59
Shrews	dmsimard: what does your config look like?	20:59
Shrews	your pools need unique names	20:59
Shrews	i'm guessing you have 2 pools under the np3 provider named "main"	21:02
Shrews	or you have two providers named "np3" (which is less likely, but possible i guess)	21:03
dmsimard	Shrews: I was mostly kidding, just not sure where to look. As far as I can tell, this is a config with one pool and one launcher	21:06
dmsimard	I can pull up some configs, hang on	21:06
Shrews	dmsimard: if it's not your config, then i'll have no idea since that's the only thing that i know it could possibly be	21:08
dmsimard	Shrews: this is the nodepoolv3 nodepool.yaml: http://paste.openstack.org/raw/626296/	21:09
Shrews	dmsimard: if that's your config, and you're sure you're starting only a single launcher, that is truly a mystery then because that should not be possible	21:11
Shrews	even with multiple launchers that shouldn't happen because the process ID is part of the launcher ID	21:12
dmsimard	Shrews: /me googles how to query zk	21:13
Shrews	dmsimard: oh, wait	21:13
Shrews	that's not a nodepool error message	21:13
Shrews	that's from zk	21:13
Shrews	dmsimard: it's fine, unless your nodepool is not working	21:14
dmsimard	yeah in fact the issue is a NODE_FAILURE but there's no real hints in zuul or nodepool logs	21:14
dmsimard	at least afaict	21:14
dmsimard	so I was digging around	21:14
Shrews	dmsimard: the main pool worker loop always tries to register itself, but ignores that error	21:15
dmsimard	Shrews: ah, okay, so red herring.	21:15
Shrews	yeah	21:15
Shrews	dmsimard: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/launcher.py?h=feature/zuulv3#n266	21:16
dmsimard	ah, okay.	21:16
dmsimard	Shrews: this is the logs I'm getting: http://paste.openstack.org/raw/626297/	21:16
dmsimard	the debug logs aren't particularly helpful, only adding a line of context: http://paste.openstack.org/raw/626298/	21:20
Shrews	dmsimard: i can't tell much from that, but given the config you pasted, and that output, i'm guessing you're asking for a "centos-oci" node type, but that is not a valid label for that launcher.	21:21
Shrews	v3-dib-centos-7 is defined	21:21
dmsimard	oh man that's it	21:21
dmsimard	2017-11-14 21:21:21,281 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[zuulv3.27-rc1.com-29912-PoolWorker.np3-main]: Declining node request 200-0000000007 because node type(s) [centos-oci] not available	21:21
dmsimard	that probably should be raised from debug ?	21:21
dmsimard	I mean, INFO or something	21:21
Shrews	yeah, was gonna say there should be a message	21:22
Shrews	i think debug is appropriate, IMO	21:22
clarkb	zuul side should probably error though?	21:22
clarkb	its not an error on the nodepool side, but if zuul is requesting invalid labels it would be an error to zuul?	21:23
Shrews	zuul knows (and logs) that the request failed, but doesn't know the reason	21:23
Shrews	we don't pass that info thru zk	21:23
clarkb	ah	21:24
dmsimard	the failure reason should be obvious without having to turn on debug :/	21:27
clarkb	dmsimard: I agree. Its just that its not an error for nodepool to get bad requests (sanitizing and handling external input and all that)	21:31
clarkb	so the error should be raised on the requestor side imo	21:31
clarkb	would it make sense to always treat it as an error in zuul if the request can't be fullfilled from nodepool? regardless of reason?	21:32
Shrews	dmsimard: it's debug because you can have multiple providers, each with different labels, and any one of them can handle the request. we'd be littering the INFO with extraneous entries that would make it noisy	21:40
Shrews	so if you have 4 providers, 3 could potentially decline the request because "invalid label", but the 4th might handle it the request just fine	21:41
Shrews	the problem with returning that info back to zuul is each provider pool might decline it for different reasons.	21:43
clarkb	ya I think what is more important on the zuul side was I asked for X and everything decline it	21:43
clarkb	the error would be I couldnt'	21:43
clarkb	get the resource I asked for anywhere	21:43
Shrews	clarkb: yeah, something more substantial on the zuul side would be nice	21:44
*** zigo has quit IRC		21:59
*** zigo has joined #zuul		22:01
*** threestrands has joined #zuul		22:15
*** threestrands has quit IRC		22:21

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!