jlk | I may have messed something up here | 00:00 |
---|---|---|
mordred | jlk: pip freeze won't show setuptools | 00:00 |
mordred | jlk: pbr freeze will though | 00:00 |
jlk | good point, I'm too used to doing _everything_ inside venvs | 00:01 |
SpamapS | Ok I think I've got a pretty good working thing now | 00:01 |
mordred | jlk: (also, you might like pbr freeze more than pip freeze anyway - it'll show you git shas for anything that has them recorded in their metadata) | 00:01 |
clarkb | pip list too | 00:01 |
mordred | which is everything that use pbr, fwiw | 00:01 |
* mordred hasn't used pip freeze in a couple of years now | 00:02 | |
jlk | I have PTSD from pbr, so I haven't touched it | 00:02 |
mordred | awww | 00:02 |
SpamapS | damnit... | 00:02 |
SpamapS | KeyError: 'getpwuid(): uid not found: 1000' | 00:03 |
SpamapS | so close | 00:03 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests https://review.openstack.org/448314 | 00:05 |
SpamapS | well I'm 5 minutes over on EOD | 00:05 |
SpamapS | jlk: ^ that kinda works for me | 00:05 |
SpamapS | except when I try to run as non root | 00:05 |
SpamapS | not sure how to plumb in my user | 00:05 |
jlk | hrm, pbr needs more than git-core it would seem | 00:07 |
jlk | oh own-goal | 00:08 |
*** rattboi has left #zuul | 00:09 | |
*** rattboi has joined #zuul | 00:09 | |
*** rattboi is now known as rattboi-test | 00:11 | |
*** rattboi-test is now known as rattboi | 00:11 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/nodepool feature/zuulv3: Merge branch 'master' into feature/zuulv3 https://review.openstack.org/445325 | 00:27 |
jhesketh | pabelanger: ^ | 00:27 |
*** harlowja has joined #zuul | 00:42 | |
openstackgerrit | K Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Perform pre-launch merge checks https://review.openstack.org/446275 | 00:44 |
openstackgerrit | K Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Perform pre-launch merge checks https://review.openstack.org/446275 | 00:48 |
jlk | oh interesting | 01:38 |
jlk | SpamapS: on my system, I'm running docker as my user, and while it's "root" inside the container, it's actually my UID. It's writing things to the filesystem that show up as my UID/GID when I look at them outside the container. | 01:38 |
jlk | BWAHAHAHA. My container got named angry_edison and I am amused. | 01:41 |
*** harlowja has quit IRC | 03:35 | |
jeblair | jlk: watch out for vengeful_tesla | 04:16 |
SpamapS | jlk: right, but I want it to not think it is root inside. | 05:10 |
SpamapS | perhaps that is a bad idea | 05:10 |
SpamapS | though I had trouble when they ran "as root" | 05:17 |
SpamapS | 590ee83c155d zuuldev "/bin/sh -c tox" 34 seconds ago Up 32 seconds dreamy_stallman | 05:24 |
SpamapS | the hits keep on coming | 05:24 |
SpamapS | jlk: also I think the way you're doing it, you have a VM between you and docker (docker-machine) so that's likely why the ownership stays you | 05:45 |
SpamapS | for me, if I'm root in the container, volume touched files are root owned | 05:45 |
SpamapS | best way to slow down zuul unit tests seems to be to run it on aufs | 05:59 |
* SpamapS looks at how to make container rootfs == tmpfs | 06:00 | |
SpamapS | oo neeat, --tmpfs /tmp makes go fast | 06:03 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app https://review.openstack.org/448395 | 06:05 |
SpamapS | hrm.. running tests in my container fails for weird reasons | 06:11 |
SpamapS | Ran 51 (-6) tests in 179.562s (+39.656s) | 06:11 |
SpamapS | FAILED (id=10, failures=8 (+3)) | 06:11 |
*** isaacb has joined #zuul | 06:26 | |
*** isaacb has quit IRC | 07:13 | |
*** isaacb has joined #zuul | 07:23 | |
SpamapS | ok, docker was a huge mistake. | 08:05 |
* SpamapS deletes it forever | 08:05 | |
* SpamapS got 6 timeouts even in single-threaded mode under docker. | 08:06 | |
*** hashar has joined #zuul | 08:19 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Remove url_pattern config parameter https://review.openstack.org/447165 | 11:08 |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Simplify the log url https://review.openstack.org/438028 | 11:08 |
*** hashar is now known as hasharLunch | 11:08 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Remove url_pattern config parameter https://review.openstack.org/447165 | 11:46 |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Simplify the log url https://review.openstack.org/438028 | 11:46 |
Shrews | jhesketh: why do you think the nodepool_id feature might need to be removed? | 12:27 |
jhesketh | Shrews: I'm not sure it will.. I suspect it's useful, but once we stop running the old nodepool it may not be necessary | 12:35 |
Shrews | jhesketh: the merge, as presented by gerrit, confuses me. Are we going to lose the current working code for test_leaked_node for the old broken version if we approve that? | 12:40 |
Shrews | i really hate to see skips added back, especially for tests that are working now :( | 12:42 |
jhesketh | Shrews: it's only tests on the v3 branch | 12:42 |
jhesketh | so another commit to turn them back on | 12:42 |
jhesketh | rather than fixing the test in the merge commit making it even longer | 12:42 |
Shrews | jhesketh: i understand the skips on the new tests that v3 didn't have. it's the existing test i'm concerned about. looks to me like the merge breaks it (and then skips it) | 12:43 |
Shrews | test_leaked_node_with_nodepool_id and test_leaked_node_not_deleted are new. that's fine to fix in a later review. but test_leaked_node works now | 12:44 |
jhesketh | Shrews: okay, that's fair | 12:44 |
jhesketh | something changed in the merge to break them, but you're right, they should probably be fixed in the merge commit rather than later on | 12:45 |
jhesketh | I'll have to look tomorrow though because it's nearly midnight and this wine is nice | 12:46 |
Shrews | jhesketh: mmm, wine. enjoy! | 12:46 |
mordred | yah - I actually agree more that it may need to be removed - we don't need nodepool_id in the zk version because we track ownership via zk, no? | 12:46 |
mordred | that was the hack for v2 to not delete v3 nodes | 12:46 |
Shrews | mordred: this entirely depends on what you do with your json change :) | 12:47 |
jhesketh | plus I just got my esp8266 reading temperature correctly so I can continue writing a logger/server to monitor what might become a cellar ;-) | 12:47 |
Shrews | mordred: did you see the note i left on that review about breaking the world if you don't abandon it? | 12:47 |
Shrews | mordred: my latest comment here https://review.openstack.org/#/c/297950/ | 12:49 |
mordred | Shrews: oh - yeah - let's just abandon that for v2 | 12:50 |
Shrews | hrm, i actually don't understand the purpose of test_leaked_node_not_deleted | 12:54 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Make sure services are running for test-setup.sh https://review.openstack.org/448555 | 12:54 |
pabelanger | morning | 12:54 |
Shrews | oh, the nodepool_id test | 12:55 |
*** hasharLunch is now known as hashar | 13:05 | |
jhesketh | night all! | 13:12 |
tobiash_ | mordred: did some short testing about io footprint optimization | 13:53 |
tobiash_ | I tested with a ccached big c++ project | 13:54 |
mordred | tobiash_: how did it go? | 13:54 |
tobiash_ | a combination of increasing commit interval of ext4 (> build duration) and a set ov vm.dirty_* settings saved me about 2.6GB of writes | 13:55 |
mordred | nice! the vm.dirty_ settings have to be set on the host, right? | 13:55 |
mordred | (rather than the guest) | 13:56 |
tobiash_ | mordred: I just do this at the beginning of the build job: http://paste.openstack.org/show/603765/ | 13:56 |
mordred | tobiash_: oh neat! | 13:57 |
tobiash_ | without this the dirty pages graph (which contains unwritten data) goes up and down | 13:57 |
tobiash_ | with this it more or less goes monotonic up (if there's enough ram) inhibiting most writes | 13:58 |
tobiash_ | if this proves giving good results in a wider range, these settings could just be baked into the dib image nodepool builds | 13:59 |
*** isaacb has quit IRC | 14:00 | |
mordred | yah - I'm going to test to see if it's effects on the openstack side real quick | 14:02 |
tobiash_ | depending on what you do in the job this might need to be combined with eatmydata | 14:04 |
mordred | tobiash_: pushed up a quick test: https://review.openstack.org/448591 | 14:05 |
tobiash_ | the build time itself (did not test many parallel jobs at once) was unaffected as the writes typically are done asynchronously | 14:05 |
mordred | tobiash_: in a few of our clouds we're our own noisy-neighbor - so even if the only effect is reducing load on the underlying cloud, it still might be a win for us | 14:06 |
mordred | verifying that will be a little bit of work of course :) | 14:06 |
tobiash_ | yepp, that was my initial idea to reduce noisy-neighbor behaviour | 14:07 |
*** isaacb has joined #zuul | 14:11 | |
tobiash_ | one possible side effect of this is that if unwritten data is so much that it is forced to write then so much data could be written at once that dmesg logs a warning like this: | 14:11 |
tobiash_ | INFO: task jbd2/sdb1-8:612 blocked for more than 120 seconds. | 14:11 |
tobiash_ | could be happen in a situation like 15gb are unwritten, 2gb free and a process wants 8gb, then 6gb would need to be synced to disk at once | 14:13 |
pabelanger | http://paste.openstack.org/show/603769/ | 14:17 |
pabelanger | mordred: managed to get dox working^ | 14:17 |
pabelanger | https://review.openstack.org/#/c/448555/ was the only patch to zuul needed | 14:18 |
Shrews | pabelanger: dox??? oy | 14:27 |
Shrews | i'm going to have to re-learn that code now, aren't i? :) | 14:28 |
pabelanger | Shrews: :) | 14:29 |
pabelanger | it's actually not that bad | 14:29 |
pabelanger | obviously we need some new images | 14:29 |
pabelanger | but, once I got my dox.yaml file setup | 14:29 |
pabelanger | things worked as expected | 14:29 |
*** bhavik1 has joined #zuul | 15:13 | |
*** isaacb has quit IRC | 15:14 | |
jeblair | clarkb: should we abandon https://review.openstack.org/436544 now? | 15:51 |
clarkb | ya I can abandon it | 15:53 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests https://review.openstack.org/448314 | 15:57 |
jeblair | SpamapS: i can't keep up with your on-again / off-again relationship with docker :) | 15:58 |
SpamapS | jeblair: I know | 15:58 |
SpamapS | it's awfu | 15:58 |
SpamapS | l | 15:59 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests https://review.openstack.org/448314 | 15:59 |
SpamapS | jeblair: that's just me dumping my local changes and context switching back into not-docker | 15:59 |
SpamapS | The reality is.. it was kind of fun to work with | 16:00 |
SpamapS | but it can't even run our test suite fast enough to avoid the hard timeouts. | 16:00 |
jeblair | :( | 16:00 |
jeblair | SpamapS: that seems strange to me based on what i think i know about containers, but i definitely don't want to fall into that rabbit hole i see you're in, so i'm just going to look away :) | 16:01 |
SpamapS | jeblair: I'm pretty sure it's the overlay filesystem | 16:01 |
SpamapS | system CPU usage was _very_ high | 16:01 |
jeblair | ah | 16:01 |
SpamapS | could dink around with btrfs or lvm | 16:01 |
SpamapS | but... | 16:02 |
SpamapS | at some point | 16:02 |
jeblair | SpamapS: and the tmpfsing didn't help? | 16:02 |
SpamapS | running the tests on a VM works fine | 16:02 |
SpamapS | jeblair: it did a little. | 16:02 |
pabelanger | SpamapS: jeblair: do you mind reviewing 448042? that is our first stem to green jobs again for zuulv3-dev | 16:02 |
SpamapS | but even with 1 thread I still got 6 alarm clocks | 16:03 |
SpamapS | pabelanger: will do that shortly | 16:03 |
jeblair | pabelanger: we haven't landed the workspace var yet have we? | 16:04 |
*** bhavik1 has quit IRC | 16:04 | |
pabelanger | jeblair: not yet, I restacked it on 448042 | 16:04 |
jeblair | cool | 16:04 |
pabelanger | that stack is now green too | 16:04 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Remove irrelevant test_merging_queues https://review.openstack.org/446768 | 16:04 |
*** bhavik1 has joined #zuul | 16:04 | |
*** harlowja has joined #zuul | 16:06 | |
*** bhavik1 has quit IRC | 16:11 | |
jeblair | pabelanger: +2s down the stack until a -1 at the end. i'll leave the +3 for you after SpamapS weighs in. i dunno if you want to ping clarkb on them too. | 16:14 |
pabelanger | jeblair: sure, more people the better! | 16:14 |
clarkb | which stack? | 16:15 |
jeblair | clarkb: starts at https://review.openstack.org/448042 | 16:15 |
jeblair | pabelanger: can you elaborate on the socket stuff you added in 446683? | 16:18 |
jeblair | pabelanger: oh, is it because we used to have something equivalent to wait until a host was up and serving ssh (while we tried to ssh and run ready script), so you want to get similar fine-grained failure messages? | 16:19 |
pabelanger | jeblair: right, with SSHClient, it would raise a socket execption, however Transport doesn't. | 16:20 |
pabelanger | So, it would results in spamming the logs: http://logs.openstack.org/83/446683/5/check/gate-dsvm-nodepool/7c4e0b6/logs/screen-nodepool.txt.gz | 16:20 |
pabelanger | and had no easy way to trap the exception | 16:20 |
pabelanger | however, open to suggestions on making it better | 16:21 |
SpamapS | pabelanger: reviewed 448042.. heading down stack | 16:22 |
* SpamapS may still be a little be decaffeinated and thus carrying a +1 cranky buff | 16:24 | |
jeblair | pabelanger: i left a -0 on 446683; can you take a look at that; and i'd like clarkb and Shrews to take a(nother) look. | 16:25 |
pabelanger | jeblair: sure | 16:25 |
SpamapS | ew | 16:39 |
SpamapS | /tmp/console.log should be /run/console.log, FYI | 16:39 |
SpamapS | (predictable filenames in /tmp are basically always a terrible idea) | 16:39 |
SpamapS | and really /run/zuul-prepared-workspace/console.log is better (just reading roles/prepare-workspace) | 16:40 |
jeblair | SpamapS: what's the attack vector/endgame there exactly? | 16:40 |
jeblair | SpamapS: console.log could end up being rather langer than one might want on a tmpfs. | 16:41 |
jeblair | (i mean, if a job wanted to replace the console log, it could do regardless of where it's stored; it will have rights to do that) | 16:43 |
SpamapS | jeblair: it's super low risk, I know, but I prefer to viciously eliminate all use of /tmp/staticanything than try to reason about every attack vector surrounding symlinks as predictable files in /tm | 16:46 |
SpamapS | damnit, my enter is beating my letters too often | 16:47 |
SpamapS | With throwaway nodes, I know we don't have to worry | 16:47 |
SpamapS | But I don't like putting bad practice into code that others will consume and possibly cargo cult without knowing. | 16:47 |
jeblair | SpamapS: okay, but in this case, we're talking about storing potentially huge files in a tmpfs, and expanding zuul's footprint on the node. | 16:48 |
jeblair | SpamapS: i don't accept it's bad practice. :) | 16:48 |
SpamapS | jeblair: /tmp/${randstring} then, and randstring=$(cat /run/mydir/wheresmystring) | 16:49 |
SpamapS | also, /var/tmp is for big files | 16:49 |
SpamapS | /tmp is not | 16:49 |
jeblair | SpamapS: no i mean i understand the issue | 16:49 |
jlk | SpamapS: yeah I'm not sure what happens on OSX any more. It's no longer "docker machine", it's a native thing? | 16:50 |
jlk | or it's a very very well hidden vm | 16:50 |
SpamapS | jlk: oh? wild | 16:50 |
jeblair | SpamapS: what i'm saying is that this is something we have reasoned about at length, and come to a conclusion. i don't like to blindly follow conventional wisdom. i think this is something to think about carefully. | 16:50 |
jlk | ah | 16:51 |
jlk | "The Docker engine is running in an Alpine Linux distribution on top of an xhyve Virtual Machine on Mac OS X" | 16:51 |
SpamapS | jeblair: fair enough. I have not taken the time to reason about it because I've accepted that it's always a bad idea and have not been proven wrong, nor have I attempted to re-evaluate that position. I'm entirely willing to ignore this case in the face of those who have taken time to think hard about it. | 16:52 |
SpamapS | jlk: yeah, well hidden is right! | 16:52 |
clarkb | is xhyve a port of bhyve to os x? | 16:52 |
SpamapS | probably | 16:52 |
jlk | Yes | 16:52 |
jlk | https://github.com/mist64/xhyve | 16:53 |
SpamapS | a google search for 'predictable filename tmp' reveals a pretty awful list of CVE's though | 16:53 |
SpamapS | so I'd have to really want to have my mind changed | 16:53 |
* SpamapS is fully submerged in the confirmation bias now | 16:53 | |
SpamapS | crap I have a meeting in 7 minuts and 8 minutes of prep | 16:53 |
* SpamapS de-ircs for a moment | 16:54 | |
jeblair | SpamapS: i'm happy to talk about it, or alternatives, if we can fast-forward past the part where we assume the author (o/) is not aware of the issues and hasn't thought of it. :) | 16:54 |
SpamapS | jeblair: No assumption is being made about the author. Only questions from a stubborn old greybeard. ;) | 16:54 |
jeblair | SpamapS: okay. happy to continue when we have some more time (it will take a bit). | 16:55 |
Shrews | jeblair: pabelanger: i think we can go ahead and move forward with https://review.openstack.org/447108 and https://review.openstack.org/447109 today, if you two agree | 17:06 |
pabelanger | sure | 17:07 |
jeblair | ++ | 17:08 |
Shrews | great | 17:09 |
clarkb | ok I think I have gotten past setuptools is broken and now need caffeien and breakfast then will review pabelanger's stack | 17:19 |
mordred | clarkb: oh good - I love it when setuptools breaks | 17:19 |
*** hashar has quit IRC | 17:29 | |
Shrews | jeblair: rbergeron: just sent you two an email regarding doc things | 17:29 |
Shrews | enjoy at your leisure | 17:29 |
* Shrews decides afternoon coffee is a good idea at this point | 17:30 | |
pabelanger | 2017-03-22 17:41:44,015 INFO nodepool.NodePool: Starting ProviderWorker.infracloud-vanilla | 17:41 |
pabelanger | Shrews: ^ | 17:41 |
Shrews | yeah. and first bug found | 17:42 |
Shrews | min-ready nodes not being started for a new provider | 17:43 |
pabelanger | ya, noticing that | 17:43 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Remove ZooKeeperConnectionConfig class https://review.openstack.org/447683 | 17:44 |
Shrews | oh, no. that's actually correct | 17:45 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix hostname issue with nodepool integration job https://review.openstack.org/448239 | 17:45 |
Shrews | it's per label, not provider | 17:45 |
Shrews | so yay | 17:46 |
jeblair | cool, so after some use, we should see some min-readies start to pop up | 17:46 |
Shrews | i'm going to delete a node, just to see if vanilla catches it | 17:47 |
jlk | SpamapS: where did you mount the tmpfs? | 17:47 |
Shrews | chocolate is greedy :( | 17:49 |
* Shrews waits for another jlk or jeblair patch bomb | 17:49 | |
jlk | oh no | 17:49 |
jeblair | Shrews: oh, i do have a stack that needs revising... | 17:50 |
Shrews | \o/ | 17:50 |
jlk | jeblair: You had mentioned something about using tmpfs to speed up tox, where did you make hte tmpfs? in .tox/ ? | 17:53 |
jeblair | export ZUUL_TEST_ROOT=/tmpfs | 17:53 |
jeblair | jlk: i do that ^ | 17:53 |
jeblair | jlk: so you can mount one anywhere, then tell zuul unit tests to use it that way | 17:54 |
jlk | I see, and that tells, tox to dump stuff there? | 17:54 |
jeblair | jlk: it's internal to zuul's tests. so whenever zuul creates a tmpdir (like ALL THE TIME) it makes one there | 17:55 |
jlk | I see | 17:55 |
jeblair | jlk: i, er, probably could have just used TMPDIR env variable, but i don't think i realized python tmpdir respected that at the time. | 17:55 |
jeblair | it's old. | 17:55 |
jeblair | for that matter, i reckon that would probably just transparently work too; i haven't tried it. :) | 17:56 |
clarkb | Shrews: in v2 allocations were proportional to total provider quota. But I think now its whoever can respond to a request first ya? | 18:02 |
clarkb | at least at zero usage | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add 'allow-secrets' pipeline attribute https://review.openstack.org/447138 | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Isolate encryption-related methods https://review.openstack.org/447087 | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Augment references of pkcs1 with oaep https://review.openstack.org/447088 | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add support for job allowed-projects https://review.openstack.org/447134 | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Serve public keys through webapp https://review.openstack.org/446756 | 18:02 |
jlk | hrm, 4 minutes in docker to run pep8 | 18:03 |
Shrews | clarkb: yeah. though each request is for a single node, so that gives each provider at least an opportunity to satisfy it | 18:03 |
pabelanger | that did something for infracloud-vanilla | 18:03 |
Shrews | cool, i see vanilla nodes | 18:03 |
jeblair | and providers should respond more slowly as they get busier; if that's not enough, we can borrow a gearman trick from zuul v2.5 and start adding proportional sleeps to the algorithm. | 18:04 |
jlk | LOL vs 32 seconds on the VM. WTF. | 18:04 |
clarkb | I approved the bottom change of pabelanger's stack (please let me know now if there are still more reviewers interested in it but looks like it got a lot of review) | 18:05 |
pabelanger | clarkb: great. I think you are the last reviewer atm. So, you should be safe to go up to 441617 | 18:06 |
clarkb | thats what I thought, thanks | 18:06 |
pabelanger | Shrews: but, if we had a nodepool-launcher, per provider, each would have 2 min-ready nodes, right? | 18:07 |
pabelanger | or still 2 across launchers | 18:07 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add 'allow-secrets' pipeline attribute https://review.openstack.org/447138 | 18:07 |
Shrews | pabelanger: no. it would still see 2 ready 'label' nodes | 18:08 |
pabelanger | k | 18:08 |
Shrews | pabelanger: assuming the labels in each config were identically named, that is | 18:09 |
pabelanger | right | 18:09 |
clarkb | pabelanger: et al one thing I notice reading http://zuulv3-dev.openstack.org/logs/1ce3b8446a594d8c8f07092786aba219/console.log to review that change is we don't annotate the console log with what script is being run | 18:11 |
clarkb | not sure if we want ot have the logger handle that globally or just modify our scripts to echo some info about themselve as a form of header or what, but it would make following console log flow easier I think | 18:11 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Create extra-test-setup role https://review.openstack.org/448042 | 18:13 |
pabelanger | clarkb: Ya, so that is an interesting problem now. Because there isn't any stdout when ansible runs the script, which means zuul_stream cannot append it to console.log. I'm not sure how best to fix that | 18:14 |
jeblair | mordred: ^ | 18:14 |
pabelanger | and agree, it will be confusing for people that just look at console.log | 18:14 |
jeblair | i know that if we *streamed* the log, we would see what was actually being run | 18:14 |
jeblair | the solution might be to copy something produced by the callback plugin rather than the console log on the host? | 18:15 |
clarkb | another log observation is we don't seem to capture where the job ran? | 18:16 |
pabelanger | ya, we need to add that still | 18:16 |
pabelanger | on my list | 18:16 |
clarkb | in one way thats nice this was an "ubuntu-xenial" host and details are abstracted. On the other the two py27 jobs took vastly different amounts of time to run and wondering if thats related to cloud /region or the changes themselves etc | 18:17 |
clarkb | pabelanger: for that I think a host info role like our net info macro in jjb probably makes esnse | 18:17 |
clarkb | just echo the hostname and some basic host networkign information | 18:17 |
pabelanger | ++ | 18:18 |
pabelanger | going to do something like: http://logs.openstack.org/17/441617/10/check/gate-zuul-pep8-ubuntu-xenial/621bbeb/_zuul_ansible/pre_playbook | 18:18 |
pabelanger | and use zuul_stream | 18:18 |
pabelanger | I'll do that now | 18:18 |
clarkb | what is zuul_stream? | 18:20 |
pabelanger | our process that runs on the worker to add things into console.log | 18:20 |
pabelanger | mordred: created it as an ansible task | 18:21 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable https://review.openstack.org/441441 | 18:21 |
clarkb | I remember reviewing those changes, but there were many and the beginning state was completely different than the end state so I'll admit to not really having it all sorted out | 18:21 |
pabelanger | actually, I think it changed a little from zuulv2.5 | 18:21 |
clarkb | ya its different | 18:21 |
pabelanger | so maybe just an echo like you said | 18:22 |
Shrews | n-l handled that workload well with the additional provider. chalking that up as a success. | 18:22 |
clarkb | iirc we fork a process on the nodepool node and every playbook that runs there writes to a socket? | 18:22 |
clarkb | and the forked process is on the other end of that socket reading the data as each playbook runs and it writes it to console log? so I think you just have to echo ya | 18:22 |
clarkb | rather than use a special annotation | 18:23 |
clarkb | pabelanger: or does it run on the zuul launcher itself? I think its the nodepool node | 18:23 |
jeblair | zuul_stream is the callback plugin which runs on the executor. | 18:24 |
jeblair | when i suggested that instead of saving the console log, we should save the output of 'the callback plugin' that's what i was referring to | 18:24 |
clarkb | jeblair: so there is a forked process for every job on the executor with an open socket reading the writes from the job itself? | 18:24 |
mordred | jeblair: reading | 18:25 |
clarkb | the callback plugin is running in the context of ansible execution on the executor, but wondering where the forked process that reads from that is | 18:25 |
clarkb | (thats the bit I am currently confused about, though its not super important here) | 18:26 |
jeblair | clarkb: the callback plugin forks a process to read the stdout over TCP from each *node* | 18:26 |
clarkb | gotcha thanks | 18:27 |
jeblair | (so we don't fork on every ansible 'command' execution -- just on the first ansible 'command' execution for a given node) | 18:27 |
clarkb | ya | 18:27 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs https://review.openstack.org/441467 | 18:27 |
pabelanger | ^ was the task I meant that will not show up in console.log | 18:27 |
mordred | jeblair: yes - I believe that is actually the right way forward - I have not done that yet because I think we'll wind up doing that as a matter of course when we plumb streaming all the way through | 18:28 |
mordred | jeblair: that is - the thing we wind up writing to console.log should be the same thing as what we do from our streaming | 18:28 |
clarkb | pabelanger: https://review.openstack.org/#/c/441617/10/playbooks/roles/prepare-workspace/tasks/main.yaml is roughyl where I would collect the data | 18:29 |
clarkb | pabelanger: have a dump of it all in one place, hostname, networking, etc | 18:29 |
jeblair | mordred: where does the output of zuul_stream go right now? | 18:29 |
clarkb | (but I agree that a host-info role is appropriate rather than part of workspace info) | 18:30 |
pabelanger | ++ | 18:30 |
mordred | jeblair: hrm - actually, that should be what's writing console.log | 18:30 |
mordred | jeblair: so, ignore me - this should already be the case | 18:31 |
jeblair | mordred: zuul_log writes /tmp/console.log on the worker node | 18:32 |
mordred | http://zuulv3-dev.openstack.org/logs/1ce3b8446a594d8c8f07092786aba219/ansible_log.txt | 18:32 |
jeblair | mordred: zuul_stream runs on the executor and reads /tmp/console.log and interpolates it with normal ansible output | 18:32 |
mordred | yes. normal ansible output goes to ansible_log.txt on the executor | 18:32 |
jeblair | mordred: okay, so zuul_stream should be going to that yeat | 18:32 |
jeblair | mordred: i don't see the console output there? | 18:32 |
pabelanger | so, in the case of: https://review.openstack.org/#/c/441467/20/playbooks/roles/revoke-sudo/tasks/main.yaml I think we should have added a shell: echo "Remove sudo access for zuul user.", so console.log seen it | 18:33 |
mordred | 2017-03-21 20:49:31,202 p=6495 u=zuul | [WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin | 18:33 |
mordred | (<ansible.plugins.callback.zuul_stream.CallbackModule object at | 18:33 |
mordred | 0x7f1b8d8f0550>): all not found in hostvars | 18:33 |
mordred | well- there's at least one issue in there-although I think jamielennox has a patch up to deal with that | 18:33 |
pabelanger | ya, we haven't restarted zuul yet | 18:33 |
jeblair | mordred: yeah, that was I9274a2098348b736198e5fea344f078ee0404b41 which merged | 18:34 |
mordred | cool | 18:34 |
mordred | there is likely a bug then - will start staring at code | 18:34 |
jeblair | cool | 18:34 |
mordred | it SHOULD be in there - that said, that ansible_log.txt file is ugly - so we may also want to do additional things | 18:35 |
jeblair | mordred, clarkb, pabelanger: so aiui, we should stop copying /tmp/console.log in our post-playbook, and instead, copy the ansible_log.txt file as its replacement. | 18:35 |
jeblair | mordred: yeah, then i think we should look at maybe doing something sane with the rsync output. that's the biggest unreadable mess. | 18:36 |
mordred | yes | 18:36 |
mordred | to both things | 18:36 |
jeblair | SpamapS: that actually removes one of the main drivers for having /tmp/console.log be a known location. after we do that, i think we can feel free to make it a regular anonymous tmpfile. :) | 18:36 |
mordred | we actually should be able to just stop copying anything in the post-playbook - ansible writes it directly into the workspace already | 18:36 |
SpamapS | jeblair: neat. :-D | 18:37 |
jeblair | mordred: er yeah, that's a good point. i mean, we did just link directly to the file. :) | 18:37 |
mordred | jeblair: so - 2 things to sort - a) make that rsync output less ugly | 18:37 |
SpamapS | jeblair: I've since had the requisite two cups of coffee, so my beard is a bit less grey and I'm far less cranky. :) | 18:37 |
mordred | jeblair: b) figure out why the console output isn't showing up in there in the first place | 18:37 |
mordred | SpamapS: you only require 2 ? | 18:37 |
jeblair | SpamapS: storing coffee in your beard for later? :) | 18:38 |
clarkb | re rsync maybe just summarize "x bytes transfered into y files" ? | 18:38 |
mordred | jeblair: I think large pile of # 127.0.0.1:22 SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.1 lines are super helpful too :) | 18:38 |
pabelanger | Oh, I figure that out | 18:38 |
pabelanger | it was ssh-keyscan | 18:38 |
jeblair | mordred: oh that's actual stdout from the test | 18:38 |
mordred | jeblair: yay! | 18:38 |
jeblair | mordred: so -- yes, we should get rid of it because it's annoying stuff in our unit tests, but from a zuul arch point of view, that is correct output from the job which should be included. :) | 18:39 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Organize playbooks folder https://review.openstack.org/441547 | 18:39 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Create tox-tarball job https://review.openstack.org/441609 | 18:39 |
mordred | we could just plop a nolog onto our rsync of the git repos | 18:39 |
jeblair | mordred: and as pabelanger says, it's in the process of being removed. | 18:39 |
jeblair | mordred: cool, that sounds nice and easy | 18:40 |
mordred | yah | 18:40 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Silence command warnings https://review.openstack.org/448748 | 18:40 |
mordred | speaking of ^^ that's a meaningless change but just happened to notice when I was looking at something else - I'm also happy if people don't think it's a good idea | 18:41 |
jeblair | oh i wish i knew that for v2.5 :) | 18:41 |
clarkb | is the all not found in hostvars the thing you were saying was fixed? | 18:41 |
clarkb | and if so, next question is why did that not make the job fail? | 18:42 |
jeblair | clarkb: i think it's just in the callback plugin which isn't critical? | 18:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop logging git repo rsync output https://review.openstack.org/448750 | 18:42 |
mordred | yes. the callback plugin is "just for logging" | 18:42 |
clarkb | I'm conflicted by that statement :) | 18:43 |
clarkb | if logging doesn't work then I have no way of knowing a success was legit | 18:43 |
clarkb | I think logging not working should be a failure? | 18:43 |
mordred | when we get zuul to start interpreting the output of running stuff, we should likely figure out a way to trap for that and fail hard | 18:43 |
mordred | clarkb: yes. I agree | 18:43 |
mordred | that's why I put it in scare-quotes | 18:43 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop copying console.log https://review.openstack.org/448752 | 18:45 |
jeblair | That's just a simple remove but i put in a todo about the tmpfile thing | 18:45 |
SpamapS | mordred: yes, 2 cups == human sauce ... more and I become a _pleasant_ human | 18:46 |
* SpamapS afk's | 18:46 | |
jeblair | but i'm going to WIP that because we shouldn't land it until mordred fixes the callback plugin | 18:46 |
mordred | before we investigate TOO much further, it might be nice to restart with jamielennox's change applied | 18:47 |
jeblair | oh! restart! | 18:47 |
jeblair | i forgot about that | 18:47 |
jeblair | we so rarely have to now :) | 18:47 |
mordred | the original testing was that stuff got written to ansible_log.txt | 18:47 |
mordred | jeblair: I know! it's exciting | 18:47 |
pabelanger | okay, a little confused about 448752 | 18:49 |
pabelanger | so, what is the console.log used for? | 18:49 |
pabelanger | just live streaming I guess | 18:49 |
jlk | SpamapS: alright I think I'm going to give up on Docker Toxxer too, at least on OSX. | 18:51 |
Shrews | jlk: you should give up on OSX | 18:52 |
Shrews | i just can't use it for real development anymore | 18:52 |
jlk | heh, I really don't feel like going through the pain of re-imaging this laptop | 18:53 |
jeblair | mordred: restarted | 18:53 |
Shrews | jlk: buy a new laptop! :) | 18:53 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop copying console.log https://review.openstack.org/448752 | 18:53 |
jlk | Shrews: you mean make IBM buy me a new laptop | 18:53 |
Shrews | yes. that. | 18:53 |
jlk | I haven't actually owned my own laptop since... 2002~ | 18:54 |
mordred | pabelanger: yes - console.log on the remote host is where things write content so that the log stremer can stream it | 18:55 |
mordred | pabelanger: we do things in the command and shell modules to cause the stdout/stderr to go there instead of being returned in the ansible return structure | 18:56 |
mordred | pabelanger: then in the zuul_stream callback plugin on the executor we open a socket connection to the daemon on the remote node and read the stdout/stderr from it and inject it into the output | 18:56 |
mordred | pabelanger: it's a bit of a strange dance - it's probably worth a diagram at some point | 18:57 |
pabelanger | do we think people will no be confused between the 2 outputs? stream and archived logging? | 18:57 |
Shrews | that screams "diagram please" | 18:57 |
clarkb | jlk: double check you have vmx? | 18:57 |
clarkb | if speed is the only problem, looks like bhyve doesn't actually require vmx for single cpu VMs so you might be getting slow emulation | 18:58 |
jlk | clarkb: it's a known issue with OSX and mounted host volumes | 18:58 |
jlk | it's just super slow. There are some hacky go-arounds, like using unison to sync files into the container rather than do a volume mount. That made it much faster, but tests that should pass are just failing | 18:58 |
jeblair | pabelanger: let's see if we can get a good example and see if it's confusing. | 18:58 |
*** harlowja has quit IRC | 18:59 | |
pabelanger | agree! I think diagram will help too | 18:59 |
jlk | I suppose it could be me testing on Fedora rather than Ubuntu | 18:59 |
jeblair | pabelanger: well, i diagram will help us understand how it works. a diagram must not be necessary to help a user understand the output. | 18:59 |
*** harlowja has joined #zuul | 19:00 | |
*** harlowja has quit IRC | 19:01 | |
jeblair | pabelanger, mordred: http://zuulv3-dev.openstack.org/logs/07206b6a6d514c40b4852254e2993f8a/ansible_log.txt is much better | 19:08 |
jeblair | though it seems to be missing the tox output? | 19:09 |
Shrews | jeblair: i wonder if we should add retry logic to our delete* zk api methods? that kazoo recursive delete() issue keeps popping up: http://logs.openstack.org/95/448395/1/check/nodepool-coverage-ubuntu-xenial/86a50ab/console.html#_2017-03-22_06_07_58_052595 | 19:11 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617 | 19:11 |
jeblair | Shrews: the shared locks would let us fix that, right? | 19:12 |
Shrews | jeblair: yeah, it would | 19:12 |
Shrews | assuming it ever gets merged :) | 19:13 |
jeblair | Shrews: maybe as a temporary measure, so we can land changes? :) | 19:13 |
jeblair | Shrews: oh, it'll get merged somewhere | 19:13 |
Shrews | jeblair: let's hold off a bit then. it doesn't bite us too terribly often. if it gets worse, then yeah | 19:13 |
jeblair | pabelanger, mordred: it looks like maybe we only got the console log for the first task? | 19:14 |
jeblair | pabelanger, mordred: oh! it's because we have multiple playbooks | 19:17 |
jeblair | pabelanger, mordred: the zuul_stream log reading subprocess terminates at the end of a playbook, but the daemon_stamp file still exists, so the next zuul_stream callback (for the next playbook) does not launch a new subprocess. | 19:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617 | 19:18 |
jeblair | we either need to make the stamp file specific to a playbook (where does it get written anyway?), or clean it up properly. | 19:18 |
mordred | jeblair: OH! | 19:19 |
clarkb | could we avoid the stamp file entirely and have ansible parent process kill the child using atexit? | 19:19 |
jeblair | clarkb: the stamp is so that two tasks within the same playbook don't each start threads | 19:20 |
jeblair | er, processes | 19:20 |
jeblair | (heh, if it were threads this would be easy) | 19:20 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617 | 19:21 |
clarkb | right instead of using a file though ust do process management? | 19:21 |
clarkb | eg "do I have a child log worker process if no then create one" then later when process dies kill its child so it doesn't zombie | 19:21 |
jeblair | clarkb: how do we prevent a second process from running for a second task? | 19:21 |
pabelanger | readying backscroll | 19:22 |
pabelanger | reading* | 19:22 |
jeblair | clarkb: oh i think i see what you're saying -- keep track subprocesses it in memory in the callback. i'm going to turn this over to mordred because i'm fuzzy on the ansible internal details here. :) | 19:23 |
clarkb | jeblair: ya either that or query the operating system for members of the process group and filter | 19:23 |
*** bhavik1 has joined #zuul | 19:24 | |
jeblair | clarkb, mordred: i will say that part of the reason we don't want to open a new tcp connection for each task is that we get the whole stream again each time. so actually, we probably don't want to just naively make the callback plugin do another subprocess for a second playbook -- we'll get a copy of everything that has come before. | 19:26 |
* mordred is trying to remember if there was a reason we didn't do it that way originally | 19:26 | |
clarkb | jeblair: oh interesting | 19:26 |
mordred | yah | 19:26 |
clarkb | jeblair: even though they are bound by different invocations of ansible-playbook? | 19:26 |
mordred | it's because on the remote host we don't have any idea that the first ansible-playbook stopped | 19:27 |
jeblair | mordred: maybe we thought that this would span playbooks and that's why we made a stamp file, but the error is that we weren't expecting the subprocess to die at the end of the playbook? | 19:27 |
mordred | yes. I think we made that logic error | 19:27 |
jeblair | (i'm assuming it does die, but i'm only inferring that from log output) | 19:27 |
mordred | which is pretty obvious now in hindsight | 19:28 |
mordred | I mean - we launch a new subprocess for each playbook | 19:28 |
jeblair | mordred: yes, though we set p.daemon, and without looking that up in the manual -- i might make assumptions as to what it does. :) | 19:28 |
mordred | yah. | 19:28 |
jeblair | to be clear, i'm going to modify and repeat an earlier thing i typed: | 19:29 |
jeblair | mordred: maybe we thought that this would span playbooks and that's why we made a stamp file, but the error is that we weren't expecting the *log streaming* subprocess to die at the end of the playbook? | 19:29 |
jeblair | (just in case that was ambiguous as to which 'subprocess' i meant there) | 19:29 |
*** bhavik1 has quit IRC | 19:29 | |
jeblair | "When a process exits, it attempts to terminate all of its daemonic child processes." | 19:30 |
mordred | jeblair: yes - I think that may be the case | 19:30 |
jeblair | so, yeah, according to docs, it seems it's expected for our ("daemon") log-streaming subprocess to die | 19:30 |
jeblair | (this is via the multiprocessing module) | 19:31 |
jeblair | i have to forage for food now | 19:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617 | 19:32 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617 | 19:33 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role https://review.openstack.org/441617 | 19:35 |
pabelanger | k, sorry for noise | 19:35 |
pabelanger | the only piece we might want, is info about which zuul-executor was used | 19:35 |
pabelanger | but, we don't log that currently | 19:36 |
*** hashar has joined #zuul | 19:40 | |
mordred | jeblair: ugh. so it's really caused by the fact that we used multiprocessing - which we used because trying to get a multiprocessing subprocess from ansible to spawn a daemon process the way you did in the log streamer originally died in a fire | 19:40 |
mordred | jeblair: ping me when you get back from food | 19:41 |
*** harlowja has joined #zuul | 19:45 | |
jlk | pabelanger: when using dox, did you just pip install it, or install from git? | 19:56 |
pabelanger | jlk: git | 19:59 |
pabelanger | I also had to rebuild the image from docker | 19:59 |
pabelanger | I was going to push up some reviews tonight | 19:59 |
jlk | okay. I wanted this to work on Fedora, but it might not. :( | 20:00 |
pabelanger | I ran dox from fedora-25 | 20:00 |
jlk | I don't know if that's th eissue. | 20:00 |
pabelanger | but used xenial images | 20:00 |
jlk | I meant the image _in_ docker | 20:00 |
pabelanger | containers* | 20:00 |
pabelanger | ya, I haven't tried fedora yet | 20:00 |
jlk | like I wanted tox to pass on fedora-d25 | 20:00 |
jlk | I don't know if that's my problem locally. | 20:00 |
pabelanger | Ya, I've been testing on fedora-25 too locally, tox does work | 20:02 |
pabelanger | I'll try fedora-25 dox later tonight too | 20:03 |
jlk | okay, well then it's probably just that docker on osx is too unstable for the py27 tox target. pep8 worked, and I got it reasonably fast, but py27 falls over for unknown reasons | 20:07 |
mordred | jlk: you saw that SpamapS had timeouts with his docker too - his hunch was aufs | 20:12 |
jlk | hrm | 20:14 |
jlk | COFFEEEEEEE | 20:15 |
mordred | yah man | 20:15 |
mordred | I'm doing that right now | 20:15 |
*** harlowja has quit IRC | 20:18 | |
pabelanger | need some coffee too... or I should just nap | 20:19 |
*** hashar has quit IRC | 20:23 | |
*** hashar has joined #zuul | 20:23 | |
jlk | okay, maybe I can think clearly now. | 20:34 |
mordred | jlk: just to be sure - have another mug | 20:39 |
jeblair | mordred: i am sufficiently burritoed. | 20:42 |
mordred | jeblair: woot! | 20:42 |
mordred | jeblair: so - I have a call with cdub in 18 minutes | 20:42 |
mordred | jeblair: but - I have been thinking about our issue ... and have 2 ideas | 20:44 |
mordred | jeblair: one is to make the streaming interaction with the worker more richer - like, rather than telnet/netcat, have it be more complex, maybe with playbook id that can get passed in so that playbooks can request to start at a point in time or something | 20:45 |
mordred | jeblair: which seems like a lot of work and probably more and more fragile - but ultimately is still doable | 20:45 |
mordred | jeblair: but the thing I like, which is even more hacky but likely to maybe/probably be simpler/more reslient | 20:46 |
mordred | is to spin up a "logging" thread/subprocess on the zuul-side which runs a playbook that basically does the zuul_log module on all of the hosts in the inventory, then just starts streaming the results back into the ansible_log.txt file like the daemon process does now | 20:47 |
mordred | and have zuul kill that subprocess when it's done executing the job | 20:48 |
jeblair | mordred: re the first thing -- that would mean having the zuul_log plugin thingy annotate the console.log with metadata, yeah? or maybe having it write out different streaming output files for each playbook (then obviously being able to serve each of them)? | 20:48 |
mordred | jeblair: yes - one of those two things | 20:48 |
jeblair | mordred: i was thinking of the problem and had two thoughts as well -- | 20:48 |
mordred | neat! | 20:48 |
jeblair | mordred: one was that part of the "start from beginning" thing was for humans, and we don't need that anymore on the worker node. so we can maybe think about dropping that. but we still have synchronization issues (like the worker starts recording data before the callback starts fetching it), so maybe we still need it. | 20:49 |
jeblair | mordred: the second was almost exactly what your second idea was. :) | 20:49 |
jeblair | mordred: so that's two votes for that. :) | 20:50 |
mordred | jeblair: woot! maybe we should explore that one for now then | 20:50 |
jeblair | mordred: sounds like a plan | 20:51 |
mordred | woot | 20:51 |
mordred | I'll think about it in earnest after my next call | 20:51 |
*** harlowja has joined #zuul | 21:33 | |
jlk | Anybody seen anything like "Exception: Job project-gerrit in common-config is not permitted to shadow job project-gerrit in common-config" ? | 21:49 |
SpamapS | not I | 21:50 |
jlk | okay cool, so this is definitely broken | 21:53 |
jeblair | jlk: no; zuul emits that when two jobs with the same name appear in two different repos. so it suggests zuul thinks that the first "common-config" is different than the second "common-config" repo in some way. | 21:57 |
jlk | that's... bizarre | 21:57 |
jeblair | jlk: (maybe it appeared twice in the repo listing in the tenant? i don't think we have any sanity checking around that yet) | 21:58 |
jlk | http://paste.openstack.org/show/603828/ is the zuul.yaml | 21:59 |
jhesketh | Morning | 21:59 |
jlk | oh so | 22:00 |
jlk | maybe | 22:00 |
jlk | http://paste.openstack.org/show/603829/ | 22:00 |
jlk | that's the tenant config | 22:00 |
jeblair | yep | 22:00 |
SpamapS | oh man | 22:01 |
SpamapS | I broke subunit | 22:01 |
SpamapS | Length too long: 18335405 | 22:01 |
jlk | jeblair: I took a guess on the tenant config, what's the right way to list multiple sources? | 22:01 |
jlk | can a tenant have more than one connection? | 22:02 |
jeblair | jlk: i think in production that should work, but the test framework probably won't deal with that correctly -- they'll need different git repo names. also, in general, we're going to have a devil of a time with identical repo names until we implement http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html | 22:04 |
jeblair | jlk: so yes, can have multiple connections, will just (temporarily) need distinct git repo names across them | 22:04 |
jlk | I can see the 'repos to test" needing to be unique, but I think where it's falling over is that there needs to be different repos to grab configuration from | 22:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:05 |
jeblair | jlk: i agree, while it's possible/likely that the test framework will not do the correct thing with the underlying git repos if they have the same name, that probably isn't the actual issue you are hitting here (unless you have distinct content in those two identically named repos -- then zuul would probably read the same content twice because of that error) | 22:07 |
jeblair | jlk: the actual error is that the model.Project object associated with each of those is the same, despite being from two different connections | 22:07 |
jlk | they aren't identically named repos, they're literally the same repo. I was approaching this from a "centrally configured service that works with multiple project locations" | 22:08 |
jlk | so one configuration repo, handling projects that live in more than one connection location | 22:08 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:08 |
jeblair | jlk: but one of them is accessed via gerrit, and one is acccessed via github... | 22:08 |
jlk | the config repo? It's accessed by git. | 22:09 |
jlk | unless I'm really misunderstanding something | 22:09 |
jlk | I shouldn't have to host my zuul configuration in somebody's gerrit in order to connect to that gerrit | 22:09 |
jeblair | jlk: that file tells zuul where to find all the repos it works with. it's a list of each connection, and for each connection, a list of repos accessed via that connection. | 22:10 |
jlk | hrm. | 22:10 |
jeblair | jlk: so that config says "work with the common-config repo on the github server" and "work with the common-config repo on the gerrit server" | 22:11 |
jlk | My brain hasn't caught up with the whole "config in git" world. | 22:11 |
jlk | I'm trying to replicate where all the config lives on the local filesystem | 22:11 |
jeblair | jlk: the "git" driver will do that for you | 22:11 |
jlk | like, I want _one_ place to put my zuul configuration, even if that configuration is used by multiple connections | 22:11 |
jeblair | jlk: i think there's still a mismatch | 22:11 |
jeblair | jlk: the zuul configuration isn't used by connections -- it's global. it loads bits of the config from every git repo it knows about. | 22:12 |
jeblair | jlk: regardless of which connection it uses to access each of those repos | 22:12 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:13 |
jeblair | jlk: so the part of the config you load from github can be used by projects in gerrit, and vice versa | 22:13 |
jeblair | (it all goes in to one pot) | 22:13 |
jlk | okay, okay... | 22:13 |
jlk | so I should be able ot list ONE config source, and it may list multiple pipelines, which use multiple drivers | 22:14 |
jlk | and I could use the "git" source in tenant config to deliver it by raw git, and not bother with trying to go through "github" | 22:14 |
jeblair | jlk: the second thing yes. the first thing i'm still having trouble with. :) | 22:15 |
jeblair | jlk: mostly because zuul won't know about any projects that aren't listed in the tenant config. | 22:16 |
jlk | er... | 22:17 |
jeblair | jlk: http://paste.openstack.org/show/603830/ is valid | 22:17 |
jlk | but we don't list projects in the tenant config | 22:17 |
jeblair | jlk: if that's what you're getting at | 22:17 |
jeblair | jlk: (then foo and bar can both be enqueued into a pipeline defined in project-config) | 22:17 |
jlk | I should re-state. | 22:17 |
jlk | _I_ haven't been listing project repos in the tenant config, I"ve only been listing them in the zuul.yaml file within the config repo | 22:18 |
jlk | I may have been doing this all wrong! | 22:19 |
jeblair | jlk: true, we list 'repos' in the tenant config, and 'projects' in zuul.yaml. however, it turns out that they need to refer to the same objects in memory anyway, and it's a little confusing, so in that email i'm proposing we start using the word 'project' in the tenant config as well. | 22:19 |
jlk | so what I've had that seems to be passing tests, is | 22:19 |
SpamapS | jeblair: when you say "replace NullChange with enqueueing Refs" .. just so I know we're on the same page.. what you meant was to enqueue a Ref that represents HEAD of the given project yes? | 22:20 |
jeblair | jlk: really every project should show up two places: once in the tenant config ("main.yaml") to tell zuul to go fetch it and read its .zuul.yaml file, and at least once in a zuul.yaml or .zuul.yaml file within one of those projects. [we will probably need to make exceptions for foreign projects in the third-party ci case, but let's ignore that for this coversation] | 22:20 |
jlk | http://paste.openstack.org/show/603831/ | 22:20 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:21 |
jeblair | SpamapS: yes, or possibly the tip of a given branch. i think that will make periodic jobs make much more sense (zuul says "test this ref. don't worry about where it's from") | 22:21 |
SpamapS | jeblair: great. Just checking before I fall down the "how do I ask Gerrit for that" hole | 22:21 |
jeblair | SpamapS: might be able to ask a zuul merger too (that might make it slightly more driver independent) | 22:22 |
jeblair | jlk: you should list org/one-job-project in main.yaml as well; at the very least, it won't be able to have a .zuul.yaml file if it's not listed there. | 22:23 |
jeblair | pabelanger: i've flipped to a -1 on https://review.openstack.org/447647 can you see my comment, please? | 22:24 |
jlk | right, we aren't... testing that path yet. | 22:24 |
jlk | it just seemed to "work" as far as unit tests are concerned. | 22:24 |
jeblair | pabelanger: i think we need a working gate-zuul-nodepool job to see the failure i mentioned. | 22:24 |
SpamapS | jeblair: Ah, ok. Hadn't thought of that. Currently changing zuul/gerrit/source.py to stop spewing NullChanges though, so right now I think it's ok. | 22:24 |
jlk | because the entirety of the config was in the common- repo | 22:24 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:24 |
SpamapS | our zuul/source/gerrit.y maybe? | 22:24 |
pabelanger | jeblair: sure. infact, it should be fixed now, I can check experimental | 22:25 |
jeblair | jlk: yeah, i can see how that would work. | 22:26 |
jlk | AHAHAHAHAHA | 22:26 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:26 |
jlk | oooops | 22:26 |
jlk | jeblair: so yeah, I just tried to list the project repo in main.yaml | 22:26 |
jeblair | jlk: we may even want to keep that working for the third-party-ci foreign-project case. but generally speaking, we'd want to list all the projects there so that they can have .zuul.yaml files. | 22:26 |
jlk | and ran into a lovely: File "zuul/driver/github/githubsource.py", line 81, in getProjectBranches raise NotImplementedError()" | 22:27 |
jeblair | jlk: that sounds useful! | 22:27 |
jlk | looks like I clearly haven't added the capability to fetch zuul files from project repos to the github driver yet. | 22:27 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:27 |
jeblair | jlk: that's a new thing in v3. it's "list all the branches of a project, so i can come right back and ask for a .zuul.yaml file on every one of them". | 22:27 |
jlk | okay well, I'll go further down the path of not listing the projects there yet | 22:28 |
jlk | because I removed it, and got the config to load (and the test fails later down the road, but that's a good start) | 22:28 |
jeblair | cool | 22:28 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role https://review.openstack.org/441617 | 22:29 |
jlk | jeblair: thank you for the guidance! | 22:29 |
pabelanger | Okay, neat! | 22:30 |
jeblair | jlk: np! | 22:30 |
pabelanger | I figured out the hostname of zuulv3-dev from ansible | 22:30 |
pabelanger | however... | 22:30 |
pabelanger | I think it is a security issue | 22:30 |
pabelanger | https://review.openstack.org/#/c/441617/ | 22:30 |
pabelanger | pretty sure we don't want to allow that ^ | 22:30 |
jlk | a lookup outside of the safe path? | 22:31 |
pabelanger | specifically the lookup for executor | 22:31 |
pabelanger | ya | 22:31 |
jlk | isn't this ran trusted? | 22:31 |
jlk | or is this the untrusted bits? | 22:31 |
pabelanger | untrusted | 22:31 |
jlk | yeah, okay. | 22:31 |
pabelanger | basically, I think we need to disable lookups for untrusted stuff | 22:31 |
pabelanger | which I think we can do in ansible.cfg? | 22:32 |
jeblair | mordred: ^ fyi | 22:32 |
jlk | file lookups for sure | 22:32 |
jlk | and probably template too | 22:32 |
pabelanger | ya | 22:32 |
jlk | since that is also doing the same thing? | 22:32 |
pabelanger | I didn't want to dig more into it | 22:32 |
pabelanger | Ihttps://docs.ansible.com/ansible/intro_configuration.html#lookup-plugins | 22:33 |
pabelanger | we should be able to set that to None for untrusted | 22:33 |
jeblair | if lookup is useful, we can also make a sanitized version of it like the other plugins | 22:34 |
jlk | oh dear there are a lot of lookups | 22:34 |
mordred | oh wow. | 22:34 |
pabelanger | They could be, but lookups are limited to side ansible-playbook runs | 22:34 |
pabelanger | so, would we want jobs looking up things on executors? | 22:34 |
jlk | well... | 22:34 |
jlk | you could lookup passwords from a password store | 22:35 |
jlk | or DNS entries, or... | 22:35 |
pabelanger | right | 22:35 |
jlk | oh god, redis. | 22:35 |
mordred | yeah - this is a whole new layer of fun | 22:35 |
jlk | hahaha | 22:35 |
jlk | there's a pipe lookup | 22:35 |
jlk | NOTHING CAN GO WRONG THERE | 22:35 |
jlk | like that's just straight "run my code on your box please" | 22:35 |
pabelanger | I did try to gather facts on localhost, which was blocked. So that is good | 22:35 |
mordred | pabelanger: woot! | 22:36 |
jlk | and env lookups. We don't have anything important in the env, do we? | 22:36 |
mordred | maybe we start with disabling lookups while we go through the list of lookups to sanitize them | 22:36 |
pabelanger | just the defaults setup by bash for zuul user | 22:36 |
jlk | yeah, shut 'em down | 22:37 |
jlk | we'll turn any on that we absolutely need | 22:37 |
pabelanger | mordred: ++ good place to start | 22:37 |
pabelanger | I mean, container things should help with this too right? | 22:37 |
jlk | eh... | 22:37 |
jlk | if we're going belt+suspenders | 22:37 |
mordred | yah - we should belt/suspenders it in both places | 22:37 |
jlk | trying to prevent local code execution | 22:38 |
pabelanger | okay, I'm good with disabling of look ups :) | 22:38 |
jeblair | ++ belt and suspenders | 22:38 |
pabelanger | so, I'd like to know the hostname of the executor, so I can log it. | 22:38 |
jeblair | pabelanger: so once we have containers, when we run into something like this we should say "oops -- look, zuul let me do this dangerous ansible that the container caught; let's go fix zuul so that doesn't happen" :) | 22:39 |
jlk | Do we have a list somewhere of all the plugins we're rejecting? | 22:39 |
pabelanger | aside from modifying zuul to add it to vars.yaml, any other suggestions? | 22:39 |
jlk | or are we rejecting _all_ custom plugins? | 22:39 |
pabelanger | jeblair: agree | 22:39 |
jeblair | pabelanger: add it to zuul.exeuctor in vars.yaml | 22:39 |
pabelanger | k | 22:39 |
jeblair | jlk: dirs and files in zuul/ansible/* is more or less the list | 22:40 |
jlk | is that a white list or a blacklist? | 22:41 |
jeblair | jlk: untrusted mode forces those to be the only callback + action (and soon + lookup) plugins available to ansible, and refuses to run if a playbook or role has a plugins dir) | 22:41 |
jlk | okay so we refuse any plugins/ dir | 22:42 |
jlk | that's good | 22:42 |
jeblair | jlk: yep, and override the built-in ones | 22:42 |
SpamapS | oh yay, I fixed enough stuff that subunit works again | 22:42 |
SpamapS | FAILED (id=16, failures=26 (-8)) | 22:42 |
jlk | we should add connections to that | 22:43 |
mordred | jlk: whatchamean? | 22:43 |
jlk | We should narrow down what connection plugins are allowed | 22:44 |
mordred | ++ | 22:44 |
jlk | so that user couldn't influence a host in the inventory to have new facts, which would include ansible_connection | 22:44 |
jlk | do we have something trying to stop users from changing ansible_host ? | 22:45 |
jlk | either by set_fact or add_host ? | 22:46 |
mordred | jlk: we do not - but we do prevent them from connecting to localhost | 22:48 |
jlk | in what way? | 22:48 |
jlk | I think I recall something checking if the connection is "local" or localhost | 22:49 |
jlk | or 127.0.0.1 | 22:49 |
jeblair | pabelanger, Shrews, rbergeron: i have started on the nodepool config structure update described at http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html i hope to have it finished this week. | 22:52 |
mordred | jlk: yah - that - we _do_ block add_hjost | 22:54 |
mordred | add_host | 22:54 |
jlk | what about set_fact? | 22:54 |
mordred | jlk: we do not block set_fack for ansible_host - so we should probbably do that (and go ahead and do ansible_connection too) | 22:54 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool feature/zuulv3: WIP Update nodepool config syntax https://review.openstack.org/448814 | 22:54 |
jlk | yeah I thought you had something in the connection plugin we force itself that prevents an attempt to hit localhost tho | 22:54 |
mordred | jlk: in the "normal" action plugin | 22:55 |
mordred | jlk: we do: | 22:55 |
mordred | if (self._play_context.connection == 'local' | 22:56 |
mordred | or self._play_context.remote_addr == 'localhost' | 22:56 |
mordred | or self._play_context.remote_addr.startswith('127.') | 22:56 |
mordred | or self._task.delegate_to == 'localhost' | 22:56 |
mordred | or (self._task.delegate_to | 22:56 |
mordred | and self._task.delegate_to.startswtih('127.'))): | 22:56 |
jlk | gotcha | 22:56 |
jlk | I wonder if play_context gets updated by the host in question | 22:56 |
mordred | so we should definitely block the set_fact route | 22:56 |
jeblair | pabelanger: for 448814 i have just started matching on provider.name.startswith('fake') for now (re the bug in 447647). i'm not super happy about that but it keeps me moving. | 22:56 |
jlk | like if the host context has hte remote addr | 22:56 |
jlk | nod | 22:56 |
jlk | jeblair: for v3, a pipeline can only have one source (driver) still, right? | 22:58 |
mordred | jlk: I should probably do the old versoins too - ansible_ssh_host and ansible_ssh_connection too - right? those are still 'valid' just deprecated? | 22:58 |
jlk | correct | 22:58 |
jlk | http://docs.ansible.com/ansible/intro_inventory.html#list-of-behavioral-inventory-parameters | 22:58 |
jlk | want to prevent setting any ansible_ssh_ stuff | 22:59 |
jlk | or ansible_sftp* | 22:59 |
jlk | or ansible_scp | 23:00 |
jeblair | jlk: currently. that needs to change as described in http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html | 23:00 |
jlk | okay | 23:00 |
*** hashar has quit IRC | 23:01 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul.executor.hostname ansible variable https://review.openstack.org/448820 | 23:01 |
pabelanger | jeblair: okay, I haven't looked at 448814 as of yet. Likely tomorrow now | 23:02 |
jeblair | pabelanger: you don't have to look at it; it's not ready yet. i pointed out the bug i ran into on 447647. | 23:04 |
pabelanger | jeblair: okay. I confused the patches | 23:05 |
jlk | jeblair: yeah I think I'm bumping into that change barrier. I wonder if I should stop trying to shove this in, and accept that for now you can have gerrit, or you can have github, but you can't have both. | 23:06 |
jeblair | jlk: fwiw, i'm tentatively planning on working on that next week. :) | 23:06 |
jlk | oh then I definitely should. | 23:06 |
Shrews | jeblair: oh cool. Thx | 23:16 |
jlk | holy crap | 23:35 |
jlk | jeblair: I'm probably missing something really really basic, but I might have made this multiple drivers thing work with just one if statement in scheduler.py.... | 23:36 |
jlk | at least until you dig into it and tear all that apart :) | 23:36 |
SpamapS | jeblair: seems like it might be nice to stack that on top of a re-factored Change/Ref model | 23:42 |
jlk | hah nope, I broke it | 23:43 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Remove Changeish and refactor around Ref as base https://review.openstack.org/448829 | 23:43 |
SpamapS | jeblair: ^^ WIP, fails 26 scheduler tests, but it's a start. :-P | 23:43 |
SpamapS | Some of them I think are failing because they're expecting some of the aspects of NullChange instead of having a Ref | 23:44 |
SpamapS | also I think I might need to dig back to HEAD^ for oldrev | 23:44 |
jlk | damn, a timer based trigger doesn't get a trigger_name | 23:54 |
mordred | jlk: when we were talking plugin holes earlier - blocking set_fact of connection things, blocking connection plugins and blocking lookup plugins were the three things we talked about yeah? | 23:55 |
jlk | uh. | 23:56 |
jlk | that reads right from looking at backscroll | 23:57 |
mordred | sweet! | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!