leifmadsen | please note there is a lot of precursor "quickstart" notes here: https://etherpad.openstack.org/p/zuulv3-quickstart | 00:00 |
---|---|---|
leifmadsen | SpamapS: ^^ | 00:02 |
leifmadsen | I need to get a couple small things out of the way, but then a "omg I just want to run some basic Zuul" is very high on my list, primarily because as every day passes, I realize I need it more and more | 00:02 |
leifmadsen | Next step in my notes, is mostly just "run a job". Based on some examples I've seen, I think that's going to be relatively straight forward. Once that works, then it's basically me turning the notes into some documentation, which I'm not too worried about, as I've done a lot of that part. | 00:03 |
tristanC | leifmadsen: hey, fwiw we are about to release software-factory 2.7, and you could get a running v3 setup with a base jobs already configured with a logserver in 3 commands | 00:08 |
leifmadsen | yea... not really what I'm looking for though | 00:08 |
leifmadsen | I understand everyone has push button infra now :) | 00:08 |
tristanC | "omg I just want to run some basic Zuul" sounds like what you need is actually a push button infra now :) | 00:11 |
leifmadsen | does it do GitHub integration too? | 00:11 |
leifmadsen | not looking to run gerrit | 00:12 |
tristanC | if you add github apitoken and webhook secret to the sfconfig.yaml, sf will add github connections and pipeline to zuul config | 00:13 |
leifmadsen | I just think SF is going to have far too many moving parts for what I really want to explore | 00:13 |
leifmadsen | also, that doesn't help the "document zuul" effort :) | 00:13 |
leifmadsen | so it's more than just "run zuul" | 00:13 |
leifmadsen | otherwise, I'd look more at Windmill, or Hoist, or XYZ | 00:14 |
SpamapS | tristanC: I'm very excited that you are doing that. :) | 00:15 |
SpamapS | I wanted to use SF | 00:15 |
SpamapS | but it was still too weird for me so I just fell back on BonnyCI/hoist | 00:16 |
SpamapS | leifmadsen: I've got it on my list of things to work on tonight, where I'll have some quiet time in a hotel room. :) | 00:16 |
SpamapS | I have written up the entire bootstrap procedure in a GoDaddy context.... | 00:16 |
leifmadsen | SpamapS: cool, well I'd encourage you to go through what I've gotten so far for sure | 00:16 |
leifmadsen | the idea being to run a single VM, that gets events from GitHub, and trigger it to run a "Hello world" ansible playbook | 00:17 |
SpamapS | Just need to genericise the parts that are like "Go to the system that allocates service accounts in AD and get one in group Z" and make those "you'll need a user account on your cloud that can do A, B, C" | 00:17 |
leifmadsen | I gotta go put the kids to bed, or I'd elaborate more, but had a talk with mordred and jeblair before we started, so they know the general approach | 00:17 |
leifmadsen | SpamapS: that's exactly what I'm trying to avoid :) | 00:17 |
pabelanger | much backscroll | 00:18 |
SpamapS | pabelanger: most of it is just wanking from me. ;) | 00:21 |
leifmadsen | SpamapS: so I didn't read the whole scrollback, but what confuses me, is why master == 2.0, and feature/zuulv3 exists at all :) I'd actually have almost thought it'd be the other way around, and there be stable/2.0 and master (future 3.0) | 00:23 |
leifmadsen | I'm sure it's been asked and answered a 1000x though | 00:24 |
pabelanger | leifmadsen: FWIW: at summit, we were really close to getting hello world job going on zuulv3 for hands-on workshop. We were lacking an openstack cloud to run jobs on a node. I'm hoping next time we can demo, something like OCI nodepool driver will be finished, or I'll ask openstack passport program to offer up some cloud resources | 00:26 |
mordred | pabelanger: or both! | 00:26 |
leifmadsen | pabelanger: didn't have an RDO Cloud login? | 00:27 |
leifmadsen | that's what I've been using anyways | 00:27 |
pabelanger | leifmadsen: we had 20 users running their own zuulv3, so the idea was each would have their own cloud creds | 00:27 |
pabelanger | mordred: ++ | 00:27 |
leifmadsen | gotcha | 00:27 |
mordred | pabelanger: one of the things that stood out to me doing the walkthrough with leif earlier is that a getting started guide that can use OCI or static or something similar as a step one, with "now you can plumb in your clouds" as step two feels like a nice incremental approach - once we can do that | 00:27 |
pabelanger | leifmadsen: also, might be interested in https://git.openstack.org/cgit/openstack-infra/publications/commit/?id=4f0a375f966171a81be4a1c76983c71359e037a1 for example playbooks for jobs. I did that for my JJB to ansible playbooks talks | 00:28 |
leifmadsen | someone (SpamapS?) mentioned the other day that I could just run against the executor itself too | 00:28 |
pabelanger | mordred: right, I think that is inline with what we talked about too | 00:28 |
mordred | "here's how you can get a zuul on your laptop that will run content all on your laptop with no clouds" - then "here's how you can add clouds" and then "here's how you can add managed base images" ... | 00:28 |
leifmadsen | pabelanger: cool, might be useful for TOAD I guess | 00:28 |
SpamapS | pabelanger: did you just do your hello world job without a nodeset? | 00:28 |
leifmadsen | mordred: +∞ | 00:29 |
SpamapS | Because I was just poking at doing a few dumb jobs on the executor today. | 00:29 |
SpamapS | Like, I have a few that just validate YAML | 00:29 |
pabelanger | SpamapS: we didn't just noops, due to time. The plan was to use a trusted playbook on executor | 00:29 |
SpamapS | don't need to install anything, just some python. Don't need a node for that. :) | 00:29 |
leifmadsen | I probably need to back up at some point and just drop nodepool entirely in the quickstart | 00:29 |
pabelanger | leifmadsen: yah, that's what we did. I was hoping to demo with RDO cloud on my laptop, but ran out of time | 00:30 |
clarkb | mordred: static to localhost would be super easy | 00:30 |
pabelanger | 90mins goes fast! | 00:30 |
clarkb | mordred: and not require any additional estup or software | 00:30 |
pabelanger | i should check if devconf.cz has a hands-on workshop session too | 00:32 |
leifmadsen | also, I've been wanting to run this on Fedora instead of Ubuntu | 00:32 |
leifmadsen | Fedora has gone well. Pretty sure I'll abandon CentOS for now. | 00:32 |
pabelanger | yah, I've dropped centos for now | 00:32 |
leifmadsen | maybe add some sidebar stuff later, but automating on Fedora is much easier | 00:32 |
pabelanger | but fedora works great | 00:32 |
leifmadsen | I mostly have zero interest in Ubuntu stuff | 00:32 |
clarkb | the only difference I would expect between the two is bubblewrap | 00:33 |
clarkb | everything else should be fairly transparent | 00:33 |
clarkb | virtualenv, run process, win | 00:33 |
pabelanger | clarkb: yah, having fedora shipping bwrap is nice. We need to get our bwrap into backports for xenial | 00:34 |
pabelanger | also, devconf.cz does have working groups :) We should totally do an installfest for zuulv3 | 00:34 |
clarkb | bwrap works great on tumbleweed too | 00:34 |
pabelanger | clarkb: when are we getting a DIB :D | 00:35 |
clarkb | pabelanger: dirk says its up for review | 00:35 |
pabelanger | cool | 00:35 |
clarkb | its on my list of things to do now that I'm home | 00:35 |
clarkb | (review the change) | 00:36 |
pabelanger | okay, submitted JJB to ansible talk again to devconf.cz | 00:48 |
*** jkilpatr has quit IRC | 01:05 | |
SpamapS | My Zuul runs on CentOS 7 | 03:24 |
SpamapS | with bubblewrap from rawhide | 03:24 |
SpamapS | pabelanger: ahh.. a week in snowy Brno. ;) | 03:32 |
pabelanger | :) | 03:33 |
SpamapS | ahhhh.. sweet sweet 1st class upgrade (even on a 60 minute flight.. so good) | 04:16 |
pabelanger | Where did we land on the tox with sudo job, did that ever get resolved? | 04:34 |
SpamapS | interesting | 07:13 |
SpamapS | I tried to make an executor-only job that runs a ruby program,but ruby does not like running in the bwrap | 07:13 |
SpamapS | http://paste.openstack.org/show/626236/ | 07:13 |
* SpamapS heads to bed to ponder this in dreamland | 07:14 | |
*** xinliang has quit IRC | 07:21 | |
*** xinliang has joined #zuul | 07:33 | |
*** xinliang has quit IRC | 07:33 | |
*** xinliang has joined #zuul | 07:33 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add role to build Puppet module https://review.openstack.org/519489 | 07:56 |
*** bhavik has joined #zuul | 08:42 | |
openstackgerrit | Rui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582 | 08:47 |
openstackgerrit | Rui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582 | 08:57 |
*** bhavik has quit IRC | 09:23 | |
*** jianghuaw has quit IRC | 09:26 | |
*** rbergeron has quit IRC | 09:34 | |
*** rbergeron has joined #zuul | 09:34 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Allow run to be list of playbooks https://review.openstack.org/519596 | 09:53 |
pabelanger | fun review is anybody else is a wake | 09:53 |
tobiash | pabelanger: this looks ok to me but I didn't fully understand your use case | 10:05 |
tobiash | the use case described in the commit message should also be possible with one playbook containing multiple plays | 10:06 |
tobiash | or did I overlook something? | 10:06 |
openstackgerrit | Akihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test https://review.openstack.org/518879 | 11:27 |
openstackgerrit | Akihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test https://review.openstack.org/518879 | 11:31 |
odyssey4me | Shrews is the request/response protocol documented anywhere? if that's the API, I'd rather be using it... | 11:46 |
*** tobiash has quit IRC | 12:04 | |
*** tobiash has joined #zuul | 12:06 | |
*** tobiash has quit IRC | 12:06 | |
*** tobiash has joined #zuul | 12:07 | |
*** jkilpatr has joined #zuul | 12:42 | |
Shrews | odyssey4me: it's described in http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html but there is no proper documentation of the protocol itself | 12:47 |
Shrews | something we should strive to correct | 12:47 |
Shrews | but it's sort of fluid right now until we get a proper release | 12:48 |
odyssey4me | Shrews ah, ok - I think we can work that in... it definitely makes sense to... but yeah, using the same protocol as zuul for requests to nodepool makes a lot of sense | 12:50 |
Shrews | odyssey4me: this is the class zuul uses for the requests: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3#n526 | 12:51 |
Shrews | odyssey4me: and this is what nodepool uses for what it expects from a request and uses for fulfillment: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n348 | 12:52 |
Shrews | the zuul class should be at least a subset of the nodepool class | 12:53 |
odyssey4me | thanks Shrews - I'll be getting back to that in a week or two. I've unfortunately got to context switch right now to something else. :/ | 12:55 |
mordred | odyssey4me: darned context switching | 12:57 |
odyssey4me | yup, comes with the territory... | 12:57 |
mordred | ++ | 12:57 |
mordred | pabelanger: I agree with tobiash - the patch looks fine but I don't fully understand the words from the commit message | 12:59 |
mordred | pabelanger: the foo and bar playbooks you have in the test could totally be two different plays in the same playbook ... that said, I don't see a reason why run shouldn't be able to take a list | 13:00 |
mordred | pabelanger, tobiash: also, I think we should sit on that one until jeblair gets back - I'm not sure if there was an active reason run was a single playbook and not a list | 13:04 |
openstackgerrit | Akihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test https://review.openstack.org/518879 | 13:19 |
rcarrillocruz | mordred, pabelanger : are we good to +A https://review.openstack.org/#/c/453968/11 | 13:33 |
* SpamapS still trying to figure out bubblewrap + ruby fail :-P | 13:35 | |
rcarrillocruz | Shrews: pushed revision on https://review.openstack.org/#/c/500800/5 last night, pls have a look when get a sec | 13:38 |
mordred | SpamapS: oh good luck with that :) | 13:38 |
SpamapS | interesting... | 13:42 |
SpamapS | so on the executor, if I rw mount /var/lib/zuul into the bwrap, ruby works fine | 13:42 |
SpamapS | suggesting that ruby uses $HOME weirdly | 13:42 |
SpamapS | yep, that's it | 13:43 |
SpamapS | well that at least simplifies things. :) | 13:43 |
SpamapS | I wonder if we should set $HOME in the bubblewrap driver. | 13:43 |
SpamapS | we change it in /etc/passwd | 13:43 |
tobiash | SpamapS: in this case we should probably | 13:44 |
SpamapS | Yeah, easy patch, and I can't see the harm in it. | 13:46 |
SpamapS | /var/lib/zuul is totally inaccessible otherwise. | 13:46 |
mordred | SpamapS: yah - I think our intent is to make the workdir appear as $HOME - so if we also need to set the variable to get that to happen, seems sane to me | 13:48 |
mordred | SpamapS: setting $HOME should avoid the need to rw mount /var/lib/zuul if I'm reading you right, yeah? | 13:49 |
SpamapS | mordred: yeah, and mounting /var/lib/zuul would be counter to the bwrap mission in this case. :) | 13:50 |
SpamapS | (Or we could bind mount work_dir on top of $HOME .. but I kind of dislike that) | 13:50 |
SpamapS | we rewrote it in /etc/passwd, I think rewriting it in the environment makes a lot of sense. | 13:51 |
SpamapS | mordred: my intention here is to test out running nodeless jobs that do very little. | 13:51 |
SpamapS | this one just runs a silly markdown linter written in ruby | 13:51 |
SpamapS | So I'm thinking, just install that script on the executors and let it run on localhost. Also means we can skip most of the stuff I have in my usual base job that verifies nodes and pushes source. | 13:53 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Override HOME environment variable in bubblewrap https://review.openstack.org/519654 | 13:57 |
*** hashar has joined #zuul | 13:58 | |
*** hashar has quit IRC | 14:00 | |
*** jianghuaw_ has joined #zuul | 14:03 | |
*** dmsimard|off is now known as dmsimard | 14:05 | |
SpamapS | Ugh, cancelling a job that is being watched ... did not go well | 14:13 |
SpamapS | http://paste.openstack.org/show/626270/ | 14:13 |
mordred | SpamapS: I've got some refactoring of that stack on my TDL for this week | 14:14 |
SpamapS | I think I haven't pulled in a while too | 14:14 |
SpamapS | Been avoiding pulling until I can CI my CI | 14:14 |
SpamapS | mordred: if you can make stream.html just retry over and over if it gets an empty log.. that would be great. I constantly click it about 1s before output has started. | 14:15 |
SpamapS | hitting cmd-R is..hard? | 14:15 |
leifmadsen | SpamapS: ruby? doing something weird? say it ain't so! | 14:21 |
tobiash | SpamapS: I thought I had a fix for that | 14:22 |
leifmadsen | SpamapS: but once you have CI for your CI, how will you test your tester? | 14:23 |
* leifmadsen ducks | 14:23 | |
leifmadsen | ok, so I think I got far enough that I have a "hello world" job and a dummy base job to load it from. How do I go about ignoring the nodepool stuff for now and running directly on the executor? I'm not able to run the job right now because Zuul complains that it doesn't have permission to ssh-add /root/.ssh/id_rsa. The seems like a simple enough configuration problem, but I think it only does that because it is | 14:24 |
leifmadsen | trying to connect to the nodepool nodes? | 14:24 |
tobiash | SpamapS: https://review.openstack.org/#/c/514617/ that should have fixed the empty log | 14:25 |
tobiash | (if this was the only cause) | 14:25 |
SpamapS | leifmadsen: when I have CI for my CI I will test my tester by CD'ing my CI'd CI. | 14:30 |
SpamapS | leifmadsen: also, just using a markdown linter. | 14:30 |
SpamapS | If there's a better one in python, I'll use that. :) | 14:30 |
leifmadsen | :buddy_jesus_thumbs_up: | 14:30 |
SpamapS | tobiash: ah yah, just need to restart zuul-executor to get that one. :) | 14:32 |
tobiash | :) | 14:32 |
SpamapS | Oh right.. can't use command: on localhost on untrusted jobs. Well poo. I just need a container thingy then.. trusted jobs are a pain. | 14:38 |
* SpamapS goes back to running it on a tiny VM | 14:39 | |
kklimonda | I’d like to use zuul pipeline and dependent jobs to introduce “checkpoints” (so that zuul skips part of the pipeline that has succeeded on the previous failure and restarts from the first failed job) - i don’t think that’s possible now, but are there any potential risks in the scheduler I should be aware of before I start looking at the code? | 14:42 |
SpamapS | kklimonda: that would require keeping state somewhere. | 14:43 |
SpamapS | kklimonda: but, what restart are you thinking that you want to avoid re-testing? | 14:44 |
kklimonda | good point, that’s correct :) | 14:44 |
kklimonda | in the gate pipeline we’ll be building packages, then docker containers and then running integration testing | 14:45 |
kklimonda | That’s 50 minutes for packaging, another 30+ (probably more) to get containers ready | 14:46 |
kklimonda | All this to rerun a flaky test | 14:46 |
kklimonda | and we do that for 4 different distros, and different OS flavors | 14:48 |
kklimonda | A failure is costly | 14:48 |
kklimonda | So now in our zuul2 setup our jobs check for the existence of artifacts and return early | 14:48 |
kklimonda | But it would be nice to make zuul aware of that so it’s not reporting jobs as taking 30 seconds | 14:49 |
kklimonda | I have thought of adding a different job return status (EXIT_EARLY) to indicate that the job didn’t run (I’m not sure about overloading SKIPPED) but making zuul more aware of the pipeline status would be even nicer | 14:51 |
kklimonda | Obviously, after your comment that’s not something b | 14:51 |
kklimonda | That can be done relatively easy* | 14:52 |
*** jkilpatr has quit IRC | 14:53 | |
SpamapS | kklimonda: I think the way you're doing it is pretty nice actually. Your artifact repository is keeping state for you. | 14:54 |
SpamapS | kklimonda: I'd be concerned about missing changes though. What ID do you use to store/fetch them? | 14:55 |
kklimonda | hmm, right now it’s a tulle of (change, patchset, job name) | 14:56 |
kklimonda | Tuple* | 14:56 |
kklimonda | I believe, I’ll double check when I get to the computer - what should I keep in mind ? | 14:57 |
SpamapS | So, that's going to break if you have a long dependent pipeline. | 15:00 |
SpamapS | If you're single-repo, it's fine. | 15:00 |
SpamapS | But if you have 2 repos in there.. those won't change, but the parent may. | 15:00 |
SpamapS | I've struggled with this a lot with Zuul actually. Need something that stays with the build from the first moment through to after the merge. | 15:01 |
leifmadsen | ok that's weird... I killed the zuul-executor, and restarted it, and now it just dies in the background... | 15:01 |
leifmadsen | oh wait, I know why | 15:03 |
* leifmadsen facepalms | 15:03 | |
leifmadsen | I killed it, so there was no removal of the /var/run/ file | 15:03 |
kklimonda | SpamapS: ha, interesting point - that makes it all a non-starter basically. Thanks | 15:07 |
kklimonda | right now we are not utilizing dependent pipelines and cross-repo dependencies but that's one of the requirements for the newer system. | 15:08 |
kklimonda | but when you think about it, right now it's also broken | 15:10 |
kklimonda | well, we can always have a periodic job that will catch anything that slips through cracks ¯\_(ツ)_/¯ | 15:24 |
kklimonda | I think I'll need a drink | 15:24 |
*** jkilpatr has joined #zuul | 15:58 | |
*** hashar has joined #zuul | 15:59 | |
rbergeron | spamaps: I believe i have located the magical karaoke location nearby-ish if you haven't identified a place yet :) | 16:00 |
*** jkilpatr has quit IRC | 16:21 | |
*** jkilpatr has joined #zuul | 16:29 | |
*** jkilpatr has quit IRC | 16:46 | |
pabelanger | tobiash: mordred: yah, I figured commit message would need more details, happy to explain. Today, this is the inventory file I use, http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/inventory along with the entry point for ansible-playbook: http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/site.yaml. Because the way ansible loads includes, if I setup a single playbook run, with | 17:04 |
pabelanger | https://review.openstack.org/#/c/519596/1/tests/fixtures/config/ansible/git/common-config/zuul.yaml as nodesets, this means a single SSH connection will be used for ansible-playbook runs. Which messes up my variable include structure, the playbooks were written to either be run on a single host or inventory file with multiple different hosts. I can use the following nodeset also, | 17:04 |
pabelanger | https://review.openstack.org/#/c/519539/16/.zuul.d/jobs.yaml but means I need to consume 6 nodes in nodepool. I am hoping ansible 2.4 might fix some of the variable issues, but switching to include_plabooks vs include | 17:04 |
pabelanger | So, happy for feedback / suggestions, but in my testing last night, this seems to be the only way to get ansible playbooks to work in zuulv3 from executor | 17:06 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add username to build and upload information https://review.openstack.org/453968 | 17:12 |
*** clarkb has quit IRC | 17:14 | |
Shrews | rcarrillocruz: i believe your last patch set on 453968 to fix the tests that you self approved exposes an issue, or at least an inconsistency | 17:16 |
Shrews | rcarrillocruz: i'm going to put up a fix | 17:16 |
*** hashar is now known as hasharAway | 17:17 | |
*** clarkb has joined #zuul | 17:22 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Be consistent with the ZK data model https://review.openstack.org/519706 | 17:27 |
Shrews | rcarrillocruz: ^^^ | 17:27 |
SpamapS | rbergeron: woot! | 18:13 |
rbergeron | spamaps: how goes your kool-aid drinking? is this your official orientation stuff? | 18:21 |
rbergeron | or was that in sunnyvale a while ago | 18:21 |
*** jkilpatr has joined #zuul | 18:30 | |
mordred | pabelanger: how would you split site.yaml into multiple playbooks otherwise? or would you basically put each of the playbooks listed in your site.yaml in the run list? | 18:55 |
pabelanger | mordred: yah, each in the run list for now is what I was thinking. The other option, is to show how allow use to generate an inventory file like: http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/inventory but have same IP for ansible_host setting of each node. Which, might be something good to support regardless. | 18:59 |
pabelanger | will be single node in nodepool, but allow ansible to think it is multiple hosts | 19:00 |
mordred | pabelanger: you can do that already ... | 19:01 |
mordred | pabelanger: just make a nodeset witha single node and that node assigned to multiple groups | 19:01 |
pabelanger | mordred: like https://review.openstack.org/#/c/519539/14/.zuul.d/jobs.yaml ? | 19:03 |
mordred | pabelanger: yup! | 19:06 |
pabelanger | mordred: right, so that does not work as I would expect, as it related to variable scoping. I think because ansible only uses a single SSH connection, variables are not reset between play runs as I would expect with running ansible-playbook multiple times | 19:07 |
mordred | pabelanger: gotcha | 19:07 |
pabelanger | mordred: the good news is, using 5 nodes in nodepool, playbooks work as expected | 19:09 |
mordred | \o/ | 19:10 |
Shrews | converting legacy devstack jobs to use the new native parent job is not as well documented as one would like | 19:49 |
Shrews | something better than zero would be nice | 19:50 |
dmsimard | ianw: so what we do for testing zuul callbacks right now is to set up a "nested" zuul in a multinode job but we're just running ansible-playbook, we're not actually exercising the executor | 20:00 |
dmsimard | ianw: nodeset and job is here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/.zuul.yaml?h=feature/zuulv3#n16 | 20:01 |
dmsimard | I don't know to what extent we could use this approach to exercise base jobs or trusted playbooks from an executor's POV | 20:01 |
dmsimard | mordred, jeblair: has there been any new ideas regarding how to test config-repo/trusted execution ? The fact that these aren't integration tested still bothers me very much. | 20:02 |
dmsimard | I haven't come up with ideas that don't involved somehow setting up a nested zuul and, like, enqueuing a job manually.. but then that also requires an openstack cloud, ugh | 20:03 |
mordred | dmsimard: once we land static node support, we should be able to just make a two-node job that installs zuul on one, registers the othre as a static node and then runs a job | 20:07 |
SpamapS | rbergeron: yes, kool-aid drinking happening | 20:07 |
ianw | mordred: are there changes out there for that already? | 20:10 |
ianw | just thinking that maybe the nodepool dsvm jobs maybe aren't that far off having a zuul added to them | 20:13 |
mordred | ianw: https://review.openstack.org/#/c/468624/ | 20:16 |
dmsimard | mordred: so then what ? the nested zuul has to load all the 2000 repository worth of configuration before being able to run and stuff ? | 20:19 |
mordred | dmsimard: nah - we'd just write out a config file with only a few repos listed I'd guess - or potentially even just use the git driver and point it at repos on the local disk *waves hands* | 20:20 |
mordred | dmsimard: I mean, I don't know the full answer yet, but I know we should at least be able to do something once static node is there without needing an openstack :) | 20:21 |
* dmsimard nods | 20:21 | |
dmsimard | progress | 20:21 |
mordred | yah. | 20:21 |
mordred | baby steps | 20:21 |
pabelanger | I'd be totally interested on how we'd trigger a job run in zuul for that | 20:22 |
*** hasharAway has quit IRC | 20:23 | |
mordred | pabelanger: magic? | 20:23 |
pabelanger | maybe fedmsg :) | 20:23 |
mordred | :) | 20:23 |
mordred | pabelanger: fedmsg is becoming the new AFS ... it's the answer we roll out for all the questions :) | 20:24 |
pabelanger | indeed! | 20:24 |
dmsimard | pabelanger: I was thinking just ... "zuul enqueue thing thing" | 20:37 |
dmsimard | Shrews: seeing NodeExists errors in our zookeeper.. which apparently only exists here: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n1308 | 20:54 |
dmsimard | Shrews: if I stop the (only) launcher we have, I stop seeing the error in the zk logs.. but they resume as soon as I start nodepool. I tried nothing and I'm all out of ideas. | 20:55 |
dmsimard | for example: http://paste.openstack.org/raw/626294/ | 20:56 |
Shrews | dmsimard: you tried nothing? lol | 20:59 |
Shrews | dmsimard: what does your config look like? | 20:59 |
Shrews | your pools need unique names | 20:59 |
Shrews | i'm guessing you have 2 pools under the np3 provider named "main" | 21:02 |
Shrews | or you have two providers named "np3" (which is less likely, but possible i guess) | 21:03 |
dmsimard | Shrews: I was mostly kidding, just not sure where to look. As far as I can tell, this is a config with one pool and one launcher | 21:06 |
dmsimard | I can pull up some configs, hang on | 21:06 |
Shrews | dmsimard: if it's not your config, then i'll have no idea since that's the only thing that i know it could possibly be | 21:08 |
dmsimard | Shrews: this is the nodepoolv3 nodepool.yaml: http://paste.openstack.org/raw/626296/ | 21:09 |
Shrews | dmsimard: if that's your config, and you're sure you're starting only a single launcher, that is truly a mystery then because that should not be possible | 21:11 |
Shrews | even with multiple launchers that shouldn't happen because the process ID is part of the launcher ID | 21:12 |
dmsimard | Shrews: /me googles how to query zk | 21:13 |
Shrews | dmsimard: oh, wait | 21:13 |
Shrews | that's not a nodepool error message | 21:13 |
Shrews | that's from zk | 21:13 |
Shrews | dmsimard: it's fine, unless your nodepool is not working | 21:14 |
dmsimard | yeah in fact the issue is a NODE_FAILURE but there's no real hints in zuul or nodepool logs | 21:14 |
dmsimard | at least afaict | 21:14 |
dmsimard | so I was digging around | 21:14 |
Shrews | dmsimard: the main pool worker loop always tries to register itself, but ignores that error | 21:15 |
dmsimard | Shrews: ah, okay, so red herring. | 21:15 |
Shrews | yeah | 21:15 |
Shrews | dmsimard: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/launcher.py?h=feature/zuulv3#n266 | 21:16 |
dmsimard | ah, okay. | 21:16 |
dmsimard | Shrews: this is the logs I'm getting: http://paste.openstack.org/raw/626297/ | 21:16 |
dmsimard | the debug logs aren't particularly helpful, only adding a line of context: http://paste.openstack.org/raw/626298/ | 21:20 |
Shrews | dmsimard: i can't tell much from that, but given the config you pasted, and that output, i'm guessing you're asking for a "centos-oci" node type, but that is not a valid label for that launcher. | 21:21 |
Shrews | v3-dib-centos-7 is defined | 21:21 |
dmsimard | oh man that's it | 21:21 |
dmsimard | 2017-11-14 21:21:21,281 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[zuulv3.27-rc1.com-29912-PoolWorker.np3-main]: Declining node request 200-0000000007 because node type(s) [centos-oci] not available | 21:21 |
dmsimard | that probably should be raised from debug ? | 21:21 |
dmsimard | I mean, INFO or something | 21:21 |
Shrews | yeah, was gonna say there should be a message | 21:22 |
Shrews | i think debug is appropriate, IMO | 21:22 |
clarkb | zuul side should probably error though? | 21:22 |
clarkb | its not an error on the nodepool side, but if zuul is requesting invalid labels it would be an error to zuul? | 21:23 |
Shrews | zuul knows (and logs) that the request failed, but doesn't know the reason | 21:23 |
Shrews | we don't pass that info thru zk | 21:23 |
clarkb | ah | 21:24 |
dmsimard | the failure reason should be obvious without having to turn on debug :/ | 21:27 |
clarkb | dmsimard: I agree. Its just that its not an error for nodepool to get bad requests (sanitizing and handling external input and all that) | 21:31 |
clarkb | so the error should be raised on the requestor side imo | 21:31 |
clarkb | would it make sense to always treat it as an error in zuul if the request can't be fullfilled from nodepool? regardless of reason? | 21:32 |
Shrews | dmsimard: it's debug because you can have multiple providers, each with different labels, and any one of them can handle the request. we'd be littering the INFO with extraneous entries that would make it noisy | 21:40 |
Shrews | so if you have 4 providers, 3 could potentially decline the request because "invalid label", but the 4th might handle it the request just fine | 21:41 |
Shrews | the problem with returning that info back to zuul is each provider pool might decline it for different reasons. | 21:43 |
clarkb | ya I think what is more important on the zuul side was I asked for X and everything decline it | 21:43 |
clarkb | the error would be I couldnt' | 21:43 |
clarkb | get the resource I asked for anywhere | 21:43 |
Shrews | clarkb: yeah, something more substantial on the zuul side would be nice | 21:44 |
*** zigo has quit IRC | 21:59 | |
*** zigo has joined #zuul | 22:01 | |
*** threestrands has joined #zuul | 22:15 | |
*** threestrands has quit IRC | 22:21 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!