*** jamielennox is now known as jamielennox|away | 00:27 | |
jlk | So nodepool, there's a launcher and a builder. Which needs a clouds.yaml credential? Do they both | 00:33 |
---|---|---|
jlk | ? | 00:33 |
*** jamielennox|away is now known as jamielennox | 00:41 | |
* rbergeron hands jlk a beer instead of an answer because that's what she has. :) | 00:42 | |
jlk | lol | 00:46 |
jlk | beer comes later, after Guardians 2 | 00:46 |
mordred | jlk: both | 01:09 |
mordred | jlk: launcher builds nodes - builder uploads to glance | 01:09 |
jlk | alrighty. Does v3 nodepool still use gearman? | 01:09 |
jlk | uh. | 01:10 |
mordred | it does not | 01:10 |
jlk | I'm seeing builder build them | 01:10 |
mordred | builder builds and uploads images - launcher builds nodes? | 01:10 |
jlk | oh sorry I misread | 01:11 |
jlk | I'm seeing builder build the images using diskimage-builder. I haven't got one to build yet, so I don't know what else happens :) | 01:11 |
jlk | I do see launcher querying the cloud | 01:11 |
mordred | ah! | 01:11 |
jlk | I have yet to see builder query the cloud | 01:11 |
mordred | yah - it likely won't until it has something to upload | 01:12 |
jlk | so nodepool just talks zookeeper | 01:13 |
mordred | yup - much simpler | 01:13 |
mordred | (well, and openstack, but yeah) | 01:13 |
jlk | heh | 01:14 |
jlk | whee, d-i-b can't find the qemu-img executable that's right close to it | 01:17 |
jlk | oh haha | 01:17 |
* jlk shakes fist at docker images that do not have "which" | 01:17 | |
jlk | also, kind of shame on d-i-b relying on 'which' to see if a app can be found. | 01:25 |
jlk | (complains about the lack of app, no the lack of which) | 01:25 |
*** dougbtv_ has quit IRC | 02:05 | |
Shrews | yah, the upload won't happen until dib build completes | 02:15 |
*** dougbtv_ has joined #zuul | 02:17 | |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Use six.moves.urllib for python3 compat https://review.openstack.org/463595 | 02:23 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Rename nodepool.py to launcher.py https://review.openstack.org/463807 | 02:23 |
*** dougbtv_ has quit IRC | 02:36 | |
*** dougbtv_ has joined #zuul | 02:39 | |
*** jamielennox is now known as jamielennox|away | 02:40 | |
*** dougbtv__ has joined #zuul | 02:56 | |
*** dougbtv_ has quit IRC | 02:58 | |
*** jamielennox|away is now known as jamielennox | 03:33 | |
*** dougbtv__ has quit IRC | 04:12 | |
*** dougbtv__ has joined #zuul | 04:12 | |
*** bhavik1 has joined #zuul | 06:12 | |
*** bhavik1 has quit IRC | 06:29 | |
*** DangerousDaren has joined #zuul | 06:47 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add Dockerfile https://review.openstack.org/465852 | 07:40 |
tobiash | jlk: I don't have any public repo hosting the Dockerfile but I think it might make sense to add one directly to nodepool ^^ | 07:41 |
tobiash | that way it also could be easy to provide an official docker image on docker hub | 07:42 |
*** Cibo_ has joined #zuul | 07:51 | |
tobiash | it bakes the current workspace into the image for fast development cycles | 07:51 |
*** Cibo_ has quit IRC | 08:02 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use ssh for git-upload-pack https://review.openstack.org/436802 | 08:06 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add Dockerfile https://review.openstack.org/465852 | 08:17 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add max-ready-age to label config https://review.openstack.org/463338 | 08:36 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Support externally managed images https://review.openstack.org/458073 | 08:44 |
*** fbo has joined #zuul | 08:51 | |
*** isaacb has joined #zuul | 09:54 | |
*** jamielennox is now known as jamielennox|away | 10:04 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Add Dockerfile https://review.openstack.org/465912 | 10:18 |
*** jkilpatr has quit IRC | 10:39 | |
*** openstackgerrit has quit IRC | 10:48 | |
*** jkilpatr has joined #zuul | 11:15 | |
*** openstackgerrit has joined #zuul | 11:45 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Add Dockerfile https://review.openstack.org/465912 | 11:45 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add missing cleanup to statsd fixture https://review.openstack.org/465962 | 13:14 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add test config with min-ready of 0 https://review.openstack.org/465968 | 13:28 |
Shrews | pabelanger: 465968 should fix the fail seen in your np py3 changes | 13:28 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: sql-connection: make _setup_tables staticmethod https://review.openstack.org/465973 | 13:35 |
*** jamielennox|away is now known as jamielennox | 13:49 | |
Shrews | tobiash: did you happen to look into this failure on your change? http://logs.openstack.org/38/463338/2/gate/gate-nodepool-python27-ubuntu-xenial/531b862/testr_results.html.gz | 13:53 |
*** Cibo_ has joined #zuul | 13:54 | |
tobiash | Shrews: not yet, just rechecked and it worked | 13:54 |
tobiash | Shrews: ah, no, it was that gate failure | 13:55 |
Shrews | hmm, let's not recheck things that should have passed. that's something we should look at | 13:55 |
tobiash | I incorporated SpamapS worries about a race and uploaded a new patch version | 13:56 |
Shrews | might not be related to your review, but maybe another race we should hunt down | 13:56 |
Shrews | hrm, i think that test makes a false assumption | 13:58 |
tobiash | Shrews: ok, the recheck I did today failed due to a statsd related failure (which I think you fixed today) | 13:58 |
tobiash | Shrews: is that correct that this test wants two nodes of fake-label and assumes that it gets one of fake-provider1 and fake-provider2? | 14:02 |
tobiash | but I think (judging from the fixture) it also could happen that both nodes come from the same provider | 14:03 |
Shrews | tobiash: yes. but that's a wrong assumption. i'm now trying to determine the purpose of this test | 14:03 |
tobiash | Shrews: according to your commit message max-servers should be 1 | 14:04 |
tobiash | in the fixture | 14:04 |
Shrews | ? | 14:05 |
Shrews | ooh, image-type is missing from the .yaml. ah ha | 14:05 |
tobiash | Shrews: https://review.openstack.org/#/c/433127/ | 14:06 |
tobiash | cite: Enables the test_node_vhd_and_qcow2 test which is an example of | 14:06 |
tobiash | min-ready=2/max-servers=1 across two providers. | 14:06 |
Shrews | i don't think that matters | 14:08 |
tobiash | Shrews: the test fixture says min-ready=2/max-servers=2 | 14:08 |
tobiash | so I think if 2 nodes are requested there's a race which can result in 1,1; 1,2; 2,1; 2,2 | 14:09 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix imports for python3 https://review.openstack.org/463808 | 14:10 |
tobiash | and the test wants 1,2 or 2,1 | 14:10 |
tobiash | if the fixture would be changed to max-servers=1 there would be just the cases 1,2 and 2,1 possible which the test expects | 14:12 |
tobiash | possibly in some rare cases currently one provider fullfills both node requests before the other has the chance to | 14:13 |
Shrews | that shouldn't matter for this test if it were working properly | 14:15 |
*** Cibo_ has quit IRC | 14:15 | |
Shrews | which it isn't | 14:15 |
tobiash | the original v2 version of this test also seems to state max-servers=1 per provider | 14:19 |
tobiash | I think that was to force usage of both providers (one using vhd, the other qcow2) in parallel (which this test wants) | 14:20 |
pabelanger | Shrews: +2 | 14:23 |
pabelanger | thanks | 14:23 |
*** dkranz has joined #zuul | 14:23 | |
Shrews | tobiash: the forced usage of both providers should come from the image-type. that's the bug | 14:24 |
Shrews | one provider supports vhd, the other qcow2 | 14:24 |
Shrews | max-servers does not matter here | 14:24 |
clarkb | Shrews: max servers does matter because both servers could schedule on just the qcow2 cloud or just the vhd cloud | 14:25 |
clarkb | so you need to set max servers such that both clouds are used | 14:25 |
tobiash | that's what I meant | 14:27 |
Shrews | ah, b/c we don't specify anything other than label (format doesn't matter). i see what you mean now. | 14:28 |
tobiash | yepp | 14:28 |
tobiash | :) | 14:28 |
pabelanger | clarkb: jeblair: have a moment to add https://review.openstack.org/#/c/465962 and https://review.openstack.org/#/c/465968/ to your review queue. Should fix some flapping testing in nodepool thanks to Shrews | 14:28 |
tobiash | have to leave now, it's eod for me | 14:29 |
Shrews | well, that presents an issue then because of commit 1fdb05. we don't have a way to force one provider over the other | 14:33 |
clarkb | right, we never did in the past either | 14:33 |
Shrews | if we set max-servers to 1, the test could hang | 14:33 |
clarkb | so what we did was require more nodes than each individual max-servers which forced them onto the various clouds | 14:34 |
clarkb | Shrews: I think thats a bug then and test is working properly? | 14:34 |
Shrews | no, that's a design decision to pause processing if we've reached max-servers | 14:35 |
clarkb | Shrews: but only per cloud right? | 14:35 |
Shrews | right | 14:35 |
clarkb | the other cloud should still process | 14:35 |
clarkb | so if this hanging causes the test to fail thats a bug? | 14:35 |
Shrews | it will, if it gets other requests | 14:35 |
clarkb | the idea here is you have demand for 2 instances and two clouds, one server each. Cloud A should fullfill half the demand and cloud B the other half | 14:36 |
clarkb | nodepool should handle that case | 14:36 |
Shrews | nope. we don't guarantee equal distribution | 14:36 |
clarkb | you do if there is no other option though | 14:36 |
Shrews | it's whichever provider responds first | 14:36 |
clarkb | if cloud A is servicing half the demand it cannot service the other half | 14:37 |
clarkb | which will force cloud B to service it | 14:37 |
clarkb | (this is why max-servers is important in this test) | 14:37 |
Shrews | this test has 2 node requests. if max-servers=1, and both requests happen to go to provider A, then A will hang until a node is freed up. that is purposeful | 14:38 |
clarkb | why would both go to provider A? | 14:38 |
Shrews | we do not see that, "oh A is filled up, let's try B" | 14:38 |
Shrews | because both providers are just processing a queue | 14:38 |
clarkb | provider A services first one, says I can't do anymore and stops. provider B says "I can fullfill this and so does" | 14:38 |
clarkb | Shrews: right but provider A should stop once it is "full" | 14:39 |
clarkb | leaving provider B to do the remainder of the work | 14:39 |
Shrews | it does stop. but it does not relinquish any requests assigned to it | 14:39 |
clarkb | I thought it did, it would reject it passing it onto the next provider | 14:39 |
Shrews | nope | 14:39 |
clarkb | then if all providers reject it its a failed request | 14:39 |
clarkb | (I seem to recall reviewing this code a while back) | 14:39 |
clarkb | in any cause if ^ isn't the behavior I would say its a bug | 14:40 |
clarkb | and the test is catching it | 14:40 |
clarkb | A should relinquish the request if it is full and let another cloud that isn't take it on | 14:40 |
clarkb | the request record tracks the failed reuqests for this raeson then it knows if all providers failed | 14:40 |
jeblair | clarkb: being at capacity isn't a reason to fail the request though -- if it were, we'd end up failing jobs when all our providers were at capacity | 14:43 |
clarkb | it is a reason to not service it though? | 14:44 |
Shrews | algorithm is outlined in https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html, fwiw | 14:44 |
jeblair | the next logical thing to do would be to return the request to the queue unfulfilled, however, that would leave to starving large requests when we're at capacity (becaues only the small ones would get picked up as each provider released one node at a time) | 14:44 |
clarkb | if we look at this situation as more than just a test, the behavior seems clearly wrong. It means that if you are in a resource constrained env you won't run your next test until current test finishes, even though 50% of total capacity is available | 14:45 |
jeblair | so the algorithm (which is admittedly not perfect), has one rough edge and this is it -- in a multiple provider situation when at least one provider is at capacity, some requests will end up being delayed slightly longer than theoretically necessary. but never more than one such request per provider. | 14:46 |
clarkb | could we avoid grabbing the request in the first place? | 14:46 |
clarkb | I guess problem is still larger requests if doing ^ | 14:47 |
jeblair | clarkb: to your point: yes, it is wrong (in that it is not ideal), but intentionally so. | 14:47 |
jeblair | clarkb: yeah, all the solutions i could think of to this involved communication between the launchers, which i wanted to avoid in our first pass at this. we could decide it's worthwhile and make them more cooperative in the future. but that adds some complexity. | 14:48 |
clarkb | maybe it is time for providers to coordinate available resources in zk? | 14:48 |
jeblair | i would still like to wait until after we've put this into production. :) | 14:49 |
clarkb | my big concern here is that tests like this come out of production issues | 14:49 |
clarkb | (we know we get constrained and we know we have to support multiple image types and we have had breakages with both things in the past hence the test) | 14:49 |
jeblair | clarkb: sure, but this test isn't testing the edge case of this algorithm. we can adjust the test to deal with the new constraint. | 14:50 |
clarkb | I'm not sure how you'd test the multiple image things if a single provider can grab all requests? | 14:51 |
jeblair | clarkb: a provider only holds a single unfulfilled request -- so basically it grabs the request which puts it at capacity, then the one that puts it above capacity, then stops. | 14:52 |
clarkb | I see so it is bounded to N+1 | 14:52 |
clarkb | so we'd need to boot a demand of 3 not 2 then it should work? | 14:52 |
jeblair | yep. so the test could probably (i haven't looked -- just woke up and all) be adjusted to allow for an extra request for each image | 14:52 |
jeblair | i imagine so (and perhaps adjust some assertions or something) | 14:53 |
Shrews | i'm changing the test to actually verify the uploaded image, rather than the booted node | 14:55 |
Shrews | since that's what is really important here | 14:55 |
jeblair | or that :) | 14:55 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix test_node_vhd_and_qcow2 to validate uploads https://review.openstack.org/465999 | 15:03 |
*** rcarrillocruz has quit IRC | 15:05 | |
*** jkilpatr has quit IRC | 15:16 | |
*** jkilpatr has joined #zuul | 15:16 | |
*** dougbtv__ is now known as dougbtv | 15:18 | |
pabelanger | Shrews: jeblair: any objections for adding a python35 non voting or experimental job for nodepool? Hope to continue pushing up patches today | 15:18 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Use six.reraise for python3 https://review.openstack.org/463594 | 15:19 |
clarkb | pabelanger: how close is it to being able to run something that passes (thinking non voting if its close) | 15:19 |
pabelanger | clarkb: I have it running locally, all tests passing. Just need to pushing up bytes changes and some other nits | 15:20 |
Shrews | pabelanger: i think we really need that if we don't want to regress current work :) | 15:20 |
pabelanger | Shrews: should I recheck tests that failed on topic:py3-nodepool or hold off until your stack is merged? | 15:22 |
pabelanger | I guess I could rebase a top of it | 15:22 |
Shrews | pabelanger: i just did that for you | 15:22 |
pabelanger | nice | 15:22 |
Shrews | the fails i fixed are very rare, so hopefully they'll pass this time w/o rebase | 15:23 |
Shrews | or my fixes merging | 15:23 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add missing cleanup to statsd fixture https://review.openstack.org/465962 | 15:30 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add test config with min-ready of 0 https://review.openstack.org/465968 | 15:30 |
jeblair | pabelanger: did you want to look at 451470? it's approved but based on an abandoned patch. | 15:30 |
pabelanger | jeblair: I have not, let me fix that | 15:31 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add max-ready-age to label config https://review.openstack.org/463338 | 15:34 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Support externally managed images https://review.openstack.org/458073 | 15:34 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add .zuul.yaml support https://review.openstack.org/466026 | 15:51 |
pabelanger | ^also adds nodepool project to zuulv3-dev.o.o for more converge | 15:52 |
pabelanger | coverage* | 15:52 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix test_node_vhd_and_qcow2 to validate uploads https://review.openstack.org/465999 | 15:59 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix test_node_vhd_and_qcow2 to validate uploads https://review.openstack.org/465999 | 16:02 |
*** rcarrillocruz has joined #zuul | 16:02 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Set socket timeout for SSH keyscan https://review.openstack.org/451470 | 16:03 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix test_node_vhd_and_qcow2 to validate uploads https://review.openstack.org/465999 | 16:03 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Ensure zookeeper_servers is a list https://review.openstack.org/463880 | 16:04 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Sort flavors with operator.itemgetter('ram') https://review.openstack.org/463998 | 16:05 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Replace dict.iteritems() with dict.items() for python3 https://review.openstack.org/464026 | 16:05 |
pabelanger | Shrews: did we recently fix http://logs.openstack.org/34/464034/4/check/nodepool-coverage-ubuntu-xenial/43e54e5/console.html#_2017-05-18_15_28_41_773418 ? | 16:06 |
pabelanger | or is that one new | 16:06 |
pabelanger | looks like our delete failed because it still had data in zk | 16:06 |
Shrews | no, that one is the kazoo recursive delete of a lock race | 16:06 |
jeblair | i'm ready to fork that shared lock implementation into our zk.py, since there's no action on the upstream pr | 16:07 |
Shrews | basically, we're in the middle of deleting a znode (that contains the lock as a child), and something else comes along an locks it during the delete | 16:07 |
pabelanger | ah | 16:08 |
pabelanger | jeblair: seems reasonable, then delete if / when upstream merges it? | 16:09 |
jeblair | pabelanger: yep | 16:09 |
pabelanger | +1 | 16:09 |
pabelanger | jeblair: mind confirming 463594 is the right approach for re-raising the exception? It looks like we are using this process is zuul/scheduler.py | 16:15 |
Shrews | i don't remember how read locks alleviate this situation :/ | 16:16 |
jeblair | Shrews: i'm sure we'll figure it out again :) | 16:16 |
jeblair | pabelanger: i'll have to do that later, i'm knee deep in pipeline requirements now | 16:18 |
pabelanger | np | 16:18 |
*** bhavik1 has joined #zuul | 16:27 | |
*** bhavik1 has quit IRC | 16:29 | |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Update exception message handling for python3 https://review.openstack.org/464034 | 16:40 |
*** isaacb has quit IRC | 16:48 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Use six.reraise for python3 https://review.openstack.org/463594 | 16:51 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Switch to next(generator) for python3 https://review.openstack.org/464040 | 16:51 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Python3: RuntimeError: dictionary changed size during iteration https://review.openstack.org/466049 | 16:51 |
jlk | o/ | 17:08 |
* SpamapS testing py3k branch with gear 0.9.0 | 17:25 | |
SpamapS | Ran 243 (+27) tests in 340.848s (+17.905s) | 17:25 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: skip py3 failing tests https://review.openstack.org/463903 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: fix webapp tests for py3 https://review.openstack.org/463902 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: misc py3 changes https://review.openstack.org/463901 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: base64 changes for py3 https://review.openstack.org/463900 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Encoding changes in tests for py3 https://review.openstack.org/463899 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Explicitly decode decrypted secrets for py3 https://review.openstack.org/464049 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: view changes for py3 https://review.openstack.org/463898 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: writing yaml to disk needs bytes https://review.openstack.org/463897 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Fixes for test_model in py3 https://review.openstack.org/463896 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: None does not compare to int https://review.openstack.org/463895 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: make Job and ZuulRole hashable https://review.openstack.org/463894 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: py3 hashlib error https://review.openstack.org/463893 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: py3 Changes in __del__ for gitpython https://review.openstack.org/463892 | 17:26 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Use gear Text interface https://review.openstack.org/461468 | 17:26 |
jeblair | jlk, mordred: i've been thinking about the next three github changes, and i think three things are pushing me toward thinking that we should rework some stuff now. 1) there's terminology collisions which are making the config file confusing (not to mention the code). 2) there's some shoehorning of the github data model into gerrit which i think will constrain us later. 3) as we discussed earlier, pipeline requirements are borderline ... | 17:27 |
jeblair | ... driver-specific (what does it mean to evaluate a gerrit change against a pipeline requirement for a github status of "zuul:check:success"?) | 17:27 |
jeblair | jlk, mordred: i've written that up with more words and a suggested refactoring here: https://etherpad.openstack.org/p/zKrBKCsc7u | 17:27 |
*** Cibo_ has joined #zuul | 17:28 | |
jlk | I fully expected such a conversation :) | 17:28 |
jlk | I kind of think that we shouldn't have any driver specific constraints at the pipeline level | 17:28 |
jeblair | jlk, mordred: can you take a look at that and see if the proposed design makes sense, and also evaluate whether now is the right time for that? | 17:28 |
jeblair | jlk: neat! that's the opposite of my proposal :) | 17:29 |
jlk | I'll read the text, then respond :) | 17:29 |
SpamapS | mordred: btw, remember we were talking about .testrepository fail? I talked to mtreinish and he pointed me at the solution: https://pypi.python.org/pypi/stestr | 17:30 |
* SpamapS preps a patch | 17:30 | |
*** harlowja has joined #zuul | 17:30 | |
Shrews | oh, speaking of py3 things... jeblair, pabelanger, SpamapS: i want to propose removing zuul's use of the hacking lib. the reason is that it hasn't quite caught up to py3.5 things (e.g., pyflakes is older and chokes on asyncio 3.5 support). | 17:50 |
SpamapS | +1 from me | 17:52 |
SpamapS | didn't know we weren't | 17:52 |
SpamapS | I imagine we'll want to turn off a bunch of them | 17:52 |
Shrews | it is on their radar, however: https://governance.openstack.org/tc/goals/pike/python35.html | 17:52 |
SpamapS | py35 is the only thing I've tested | 17:57 |
SpamapS | or would really want to bother with | 17:57 |
Shrews | the current code is cool with it. the only problem i've seen so far is the websocket stuff failing pep8 http://logs.openstack.org/53/463353/10/check/gate-zuul-pep8-ubuntu-xenial/41f82e6/console.html.gz#_2017-05-11_16_48_56_738736 | 17:58 |
Shrews | which is where i stopped before i caught the plague | 17:59 |
pabelanger | do we know if anybody is looking at hacking ATM? | 17:59 |
SpamapS | ooooo the entire py3k stack passes tests | 18:00 |
SpamapS | _SHIP IT_ | 18:00 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Fix typos in __eq__ functions https://review.openstack.org/466061 | 18:00 |
pabelanger | Shrews: ^ a few syntax errors | 18:01 |
*** harlowja has quit IRC | 18:01 | |
Shrews | pabelanger: doh | 18:01 |
pabelanger | surprisingly, exposed with python3 | 18:01 |
*** DangerousDaren has quit IRC | 18:02 | |
pabelanger | py35: commands succeeded | 18:03 |
pabelanger | same for nodepool :D | 18:03 |
pabelanger | actually, just 1 think left | 18:03 |
pabelanger | preexec_fn=self._activate_virtualenv is bombing | 18:03 |
pabelanger | Shrews: mind +3 again on https://review.openstack.org/#/c/464040/ | 18:05 |
pabelanger | was a rebase | 18:05 |
Shrews | pabelanger: done. and a question on 466049 | 18:07 |
*** Cibo_ has quit IRC | 18:08 | |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Switch to next(generator) for python3 https://review.openstack.org/464040 | 18:08 |
pabelanger | Shrews: k | 18:09 |
jeblair | Shrews: i'm okay with removal of hacking. we mostly use it just to avoid the pep8 version treadmill. if it's actually holding us back on some py3 things, there's no value there. | 18:14 |
*** harlowja has joined #zuul | 18:15 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Python3: encode / decode data as utf8 https://review.openstack.org/466065 | 18:15 |
Shrews | ok. i'll try to figure out what libs we need to specify manually in replacing it. hopefully not many | 18:15 |
clarkb | I think just pep8 and pyflakes | 18:15 |
jeblair | ya those | 18:15 |
clarkb | oh and flake8 for the config | 18:15 |
clarkb | which is in tox.ini | 18:15 |
Shrews | oh lawd... that opens up a bunch of new pep8 errors | 18:21 |
jeblair | Shrews: we likely want to pin on whatever versions hacking was pinning on | 18:21 |
Shrews | well that was the problem | 18:22 |
Shrews | flake8 was too old | 18:22 |
Shrews | it's actually not many new ones. | 18:22 |
jeblair | Shrews: bump flake8 but not pep8? | 18:22 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Remove use of hacking lib https://review.openstack.org/466067 | 18:23 |
mordred | jeblair: sorry - been writing specs all morning - looking now | 18:24 |
jeblair | Shrews: what's f405? | 18:25 |
Shrews | 1 sec... amending commit msg | 18:25 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Remove use of hacking lib https://review.openstack.org/466067 | 18:25 |
Shrews | jeblair: read new commit msg | 18:25 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Python3: RuntimeError: dictionary changed size during iteration https://review.openstack.org/466049 | 18:26 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Python3: encode / decode data as utf8 https://review.openstack.org/466065 | 18:26 |
jeblair | Shrews: ok. bummer. i hate to turn off a pyflakes check, but that's deep into the ansible magic. maybe we can sprinkle more #noqa around and re-enable it | 18:28 |
Shrews | jeblair: it also reports that error for the time and select libs, neither of which are imported explicitly | 18:28 |
Shrews | jeblair: i can work on re-enabling all of those individually | 18:29 |
mordred | star imports are frequently used in ansible and in ansible modules ... so that one might wind up being transitively difficult | 18:30 |
Shrews | mordred: the answer is to obviously stop using ansible. i hear this hudson thing is pretty good | 18:30 |
mordred | Shrews: good call | 18:30 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Wrap map() in list() for python3 https://review.openstack.org/466069 | 18:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: WIP: Fix preexec_fn for python3 https://review.openstack.org/466070 | 18:31 |
jeblair | Shrews: cool. just remember to leave E125 and E129. also, any other E or W you want to leave are fine with me too. :) | 18:32 |
SpamapS | oh weird.. py3k failed 2 patches in the middle | 18:32 |
pabelanger | okay, 466070 should be green for python35 | 18:33 |
pabelanger | but it is WIP because I need to understand why it is failing | 18:33 |
pabelanger | it is the magic virtualenv enabling logic | 18:33 |
pabelanger | personally, would rather just remove that code and have the user wrap DIB virtualenv in shell if possible | 18:34 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Switch from testrepository to stestr https://review.openstack.org/466071 | 18:35 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix typos in __eq__ functions https://review.openstack.org/466061 | 18:39 |
Shrews | mordred: all of the files in zuul/ansible/action are just straight copies of the ansible modules, yeah? | 18:39 |
mordred | Shrews: well - they're modified copies | 18:39 |
Shrews | k | 18:40 |
mordred | Shrews: some of them are completely new code that basically subclasses the ansible module, peeks at a parameter and potentially fails | 18:40 |
mordred | but those should be fairly easy to notice :) | 18:40 |
Shrews | i was thinking of having flake8 ignore that entire directory | 18:40 |
jlk | tobiash: I'm running into an issue doing a dib inside a container, a failure to communicate with device-mapper driver. This is toward the end, a long build process. Are you doing anything special to expose these things inside the container for the builder ? | 18:44 |
mordred | jlk: fwiw - if you want to iterate on getting dib to work in a container with a shorter build process, just make the list of elements have only "ubuntu" or "ubuntu-minimal" and "vm" and nothing else - the mechanisms for making the image themselves should all get exercised by that without you needing to wait for content to be written into the chroot | 18:46 |
jlk | ah | 18:46 |
clarkb | and add devuser if you want to boot it and login | 18:47 |
jlk | not as of yet | 18:47 |
jlk | that's so much faster | 18:49 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Python3: Use exec for self._activate_virtualenv() https://review.openstack.org/466070 | 18:54 |
* SpamapS rebases py3k on feature/zuulv3 and finds that the github stuff is all t3h broken in py3 | 18:54 | |
jlk | jeblair: okay I've read what you wrote. | 18:55 |
pabelanger | k, that should be nodepool for py3 support | 18:55 |
pabelanger | I should see what is needed to run nodepool-dsvm python3 job also | 18:55 |
Shrews | hrm, testing env upgrade breaks zuul tests. neat | 18:56 |
jlk | jeblair: if require/reject grow driver specific capability, would we still support generic requirements, for shared states, like 'open', 'latest', etc..? Similar to how we talked about allowing generic reporters that are driver agnostic. | 18:56 |
*** harlowja has quit IRC | 18:56 | |
jlk | or should all of the require/reject exist inside driver implementation? | 18:56 |
clarkb | pabelanger: should just need to set the python version option and devstack will do the rest | 18:57 |
pabelanger | jlk: mordred: ubuntu-rootfs is even smaller: https://review.openstack.org/#/c/413115/ got it down to about 42MB for xenial | 18:57 |
clarkb | pabelanger: I think setting it in the plugin will be too late though, need a new job or update the existing job | 18:57 |
pabelanger | clarkb: k, I'll look | 18:58 |
pabelanger | woot | 19:01 |
pabelanger | http://logs.openstack.org/70/466070/2/check/gate-nodepool-python35-nv/019d010/console.html#_2017-05-18_19_00_00_254400 | 19:01 |
SpamapS | incoming py3k rebase patch bomb | 19:05 |
SpamapS | With a fun surprise at the end | 19:06 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: skip py3 failing tests https://review.openstack.org/463903 | 19:06 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: fix webapp tests for py3 https://review.openstack.org/463902 | 19:06 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: misc py3 changes https://review.openstack.org/463901 | 19:06 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: base64 changes for py3 https://review.openstack.org/463900 | 19:06 |
Shrews | cookies? | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Encoding changes in tests for py3 https://review.openstack.org/463899 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Explicitly decode decrypted secrets for py3 https://review.openstack.org/464049 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: view changes for py3 https://review.openstack.org/463898 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: writing yaml to disk needs bytes https://review.openstack.org/463897 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Fixes for test_model in py3 https://review.openstack.org/463896 | 19:07 |
Shrews | plz be cookies... | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: None does not compare to int https://review.openstack.org/463895 | 19:07 |
jlk | any way to tell nodepool to try building an image once, and stop if it fails? | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: make Job and ZuulRole hashable https://review.openstack.org/463894 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: py3 hashlib error https://review.openstack.org/463893 | 19:07 |
Shrews | plz be cookies... | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: py3 Changes in __del__ for gitpython https://review.openstack.org/463892 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: fix imports in py3 https://review.openstack.org/463891 | 19:07 |
Shrews | plz be cookies... | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Use gear Text interface https://review.openstack.org/461468 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Fix github driver tests for py3 https://review.openstack.org/466078 | 19:07 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Enable py3 tests https://review.openstack.org/466079 | 19:07 |
SpamapS | Shrews: py3 cookies | 19:07 |
*** harlowja has joined #zuul | 19:07 | |
jlk | any way to tell nodepool to try building an image once, and stop if it fails? | 19:08 |
jlk | pabelanger: that didn't merge, is that not currently available? | 19:08 |
pabelanger | jlk: ya, you'd have to download the patchset | 19:14 |
pabelanger | maybe we should just add it to project-config at this point | 19:15 |
pabelanger | I've been getting a lot of push back from DIB on them | 19:15 |
pabelanger | but don't understand why | 19:15 |
*** tobiash_ has joined #zuul | 19:24 | |
jlk | wtf.. | 19:25 |
jlk | Journal size too big for filesystem. | 19:25 |
tobiash_ | jlk: I had issues with dib builds inside the docker filesystem | 19:25 |
tobiash_ | jlk: so I mount it from outsife | 19:25 |
tobiash_ | outside | 19:25 |
jlk | what's your launch look like? | 19:26 |
tobiash_ | jlk: I'm also not using the device mapper driver (for this to work you probably have to mount /sys stuf into) | 19:27 |
tobiash_ | jlk: I'm currently using the aufs driver (nodepool runs in docker on ubuntu xenial) | 19:27 |
tobiash_ | and I think it needs to run privileged | 19:28 |
jlk | not sure what I'm running | 19:28 |
tobiash_ | but I have to check again my docker-compose file @work tomorrow | 19:28 |
jlk | Storage Driver: overlay2 | 19:28 |
jlk | Backing Filesystem: extfs | 19:28 |
jlk | Volume: local | 19:29 |
tobiash_ | jlk: I had problems with any storage driver until I mounted the dirs for tmp, elements and image together from the host (ext4) into the container | 19:30 |
tobiash_ | so currently I have /mnt/builder/tmp, /mnt/builder/image, /mnt/builder/tmp on the host and mount /mnt/builder into the container at some location and configure nodepool to use that (also tell dib to use the correct tmp dir) | 19:31 |
clarkb | dib needs to be able to do things like mount filesystems and use losetup and nbd iirc | 19:32 |
clarkb | (so thats where privileged likely comes in) | 19:32 |
tobiash_ | that's the reason for privileged container | 19:32 |
jlk | yeah I have privileged set up | 19:32 |
tobiash_ | but just for the builder | 19:32 |
pabelanger | IIRC, I had DIB in docker, but pretty sure it was privileged | 19:32 |
jlk | it's just apparently struggling around loopback stuffs | 19:33 |
pabelanger | I should check my notes, it was working for me back in Jan and I don't think I had any specific bind mounts going on | 19:33 |
jlk | https://github.com/j2sol/z8s has my work in progress | 19:33 |
pabelanger | https://github.com/pabelanger/docker-diskimage-builder | 19:33 |
pabelanger | basically used ansible and dib to build the container | 19:33 |
leifmadsen | https://coreos.com/blog/introducing-zetcd | 19:33 |
pabelanger | then ran it under docker | 19:34 |
tobiash_ | jlk: so basically I use this dockerfile: https://review.openstack.org/#/c/465852/ | 19:35 |
tobiash_ | and a similar command like you in the docker-compose except that I'm mounting the dib workspace into the container | 19:36 |
tobiash_ | otherwise dib broke latest at the 'copy everything into the diskimage' step | 19:36 |
pabelanger | http://git.openstack.org/cgit/openstack/ansible-role-diskimage-builder is what I used to install DIB, then just imported that into docker | 19:36 |
tobiash_ | I didn't figure out why (tried several storage drivers) but when mounting that just into the container it's fine | 19:37 |
tobiash_ | pabelanger: right, I also didn't need any specific bind mounts other than the working area of dib | 19:39 |
clarkb | bind mounting the workspace has the potentialy upside of being able to share dib caches | 19:40 |
clarkb | (regardless of the weirdness around why it seems to be required) | 19:40 |
*** dougbtv_ has joined #zuul | 19:41 | |
*** dougbtv has quit IRC | 19:41 | |
tobiash_ | what I also noticed is that if the dib build failed (e.g. due to some transient network failure) nodepool leaves the tmp dir of the build untouched | 19:41 |
tobiash_ | that filled up the disk a few times (until I had monitoring in place) | 19:42 |
jlk | whats the dib workspace? | 19:43 |
jeblair | SpamapS, mordred: er, can we hold off on this stestr thing? | 19:43 |
jlk | the part that you needed volume mounted? Looking at the docker file, the volumes don't make much sense, other than the elements | 19:44 |
jlk | tobiash_: ^ | 19:44 |
tobiash_ | jlk: I don't know how to call it correctly but with that I mean the elements folder, the tmp dir (which you tell dib) and the dest dir of the image (configured in nodepool) | 19:44 |
greghaynes | jlk: you still getting journal too big for fs errors? | 19:45 |
tobiash_ | jlk: ups, looks like these volume declarations mostly resulted from earlier refactorings (still has to be polished a bit) | 19:45 |
jlk | greghaynes: I was, trying to do a volume mount now. | 19:46 |
*** dougbtv__ has joined #zuul | 19:46 | |
tobiash_ | jlk: journal too big errors were gone when bind mounting tmp and dest into the container | 19:46 |
greghaynes | ah. Thers some wierdness there because dib images generally are resized up a large amount so if dib used the default journal size for the size disk it makes there wouldnt be a large enough journal when the image gets resized | 19:46 |
greghaynes | so it forces a larger journal than needed | 19:46 |
greghaynes | which involves some wierd math that had some edge cases | 19:47 |
jlk | okay | 19:47 |
jlk | I'm trying with /tmp/ and /opt/nodepool/images volume mounted | 19:47 |
tobiash_ | jlk: the volumes declared in the dockerfile more or less define the places where I bind-mount the nodepool config currently | 19:48 |
tobiash_ | /var/log/nodepool: getting logs from nodepool out of the container | 19:48 |
tobiash_ | /etc/nodepool: nodepool config | 19:48 |
tobiash_ | /etc/openstack: place for clouds.yaml | 19:48 |
*** dougbtv_ has quit IRC | 19:49 | |
greghaynes | jlk: hrm, I wonder if that might be causing some of the journal size math to get messed up | 19:49 |
tobiash_ | /opt/setup-scripts: I think is outdated as v3 has no setup scropts anymore | 19:49 |
jlk | somewhat annoying to watch the container output like a hawk to catch the dib error before it starts over again. | 19:50 |
tobiash_ | I think that journal size stuff has some weird interference with the docker storage driver if it is created within the storage managed by docker | 19:50 |
tobiash_ | jlk: you could just run more than one builder and have a chance that the next start is on a different one ;) | 19:51 |
jeblair | SpamapS, mordred: for the first time in years, i'm almost as happy as i can be with how the tests are running; i kinda just want to get on with the work of writing code, and debugging a new test runner just sounds like distraction we don't need right now. | 19:52 |
clarkb | jlk: can't you just grab the image build log? not sure why you'd have to watch it like a hawk | 19:53 |
jlk | not sure where the build log goes | 19:53 |
jlk | I don't have a logging config, so everything is dumping to stdout/err | 19:53 |
jeblair | jlk: regarding require/reject -- i think ultimately because of the way the configuration is set up, it will end up being driver specific. but perhaps we can put the implementation of 'open' and 'current-patchset' in the base filter class so it's easy for drivers to implement those? | 19:55 |
jlk | nodepool-builder_1 | 2017-05-18 19:59:30,934 INFO nodepool.builder.BuildWorker.0: DIB image ubuntu-xenial is built | 19:59 |
jlk | tobiash_: high five! | 19:59 |
tobiash_ | \o/ | 19:59 |
* jlk brings the whole thing up to see if it gets tossed at the cloud | 20:02 | |
*** harlowja has quit IRC | 20:03 | |
*** tobiash_ has quit IRC | 20:12 | |
jeblair | jlk: what do you think? should we go with https://etherpad.openstack.org/p/zKrBKCsc7u or something else? | 20:19 |
jlk | awww yissss. Image uploaded, VM booted. node registered. Stuffed a github webhook event into the webapp and it did the end to end thing (although noop task so it didn't actually touch the vm) | 20:19 |
jlk | jeblair: I think it's doable | 20:19 |
jlk | without having looked too deeply at implementation yet | 20:19 |
jeblair | jlk: you think it's worth doing? our other options are either to abstract everything (which we've generally been avoiding), or eliminate pipeline requirements which can't be easily abstracted. we're using all of them now, and this kind of expressiveness seems worth keeping to me: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n41 | 20:23 |
jlk | With the new plan, that layout would have to get modified right? To push those existing require into a gerrit: block? | 20:24 |
jlk | just making sure I understand. | 20:24 |
jeblair | jlk: yep, but otherwise look about the same i think | 20:25 |
jlk | I _do_ think ti's worth doing. This was one area we identified as needing better driver integration, so that we wouldn't necessarily have to mess with model.py directly | 20:25 |
jlk | define the filters in teh driver code, allow them to be plugged in | 20:25 |
jeblair | jlk: okay, i'll -1 the top of the stack with that. do you want me to write the change to add the filter driver api, or do you want to do it? | 20:29 |
jlk | If you have time for it, I wouldn't mind you doing it, and I'll follow up quickly with reviews and trials to use it. Otherwise I can take it on, I'll just likely be bugging you a fair amount for guidance :) | 20:30 |
*** harlowja has joined #zuul | 20:30 | |
jeblair | jlk: okay, i'll take a stab at it. thanks :) | 20:31 |
*** harlowja has quit IRC | 20:39 | |
*** jkilpatr has quit IRC | 20:41 | |
*** dougbtv__ has quit IRC | 20:45 | |
*** dougbtv__ has joined #zuul | 20:58 | |
*** dougbtv__ is now known as dougbtv | 20:58 | |
*** dougbtv_ has joined #zuul | 21:02 | |
*** dougbtv has quit IRC | 21:05 | |
*** dougbtv__ has joined #zuul | 21:05 | |
*** dkranz has quit IRC | 21:06 | |
*** dougbtv_ has quit IRC | 21:07 | |
*** dougbtv_ has joined #zuul | 21:11 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP driver-specific filters https://review.openstack.org/466105 | 21:11 |
jeblair | jlk: ^ not done yet, but there's the general shape if you wanted to take a peek. | 21:12 |
*** dougbtv has joined #zuul | 21:13 | |
jeblair | jlk: as you'll see, i'm actually leaning toward adding it to the 'source' interface of the driver -- it seems to me to be connected to that, since a source is where a change comes from, that's the context where these attributes are being evaluated | 21:14 |
*** jhesketh_ has joined #zuul | 21:14 | |
*** dougbtv__ has quit IRC | 21:14 | |
jlk | alrighty | 21:14 |
jlk | I'll look after lunching | 21:14 |
*** dougbtv_ has quit IRC | 21:15 | |
*** jkilpatr has joined #zuul | 21:16 | |
*** dougbtv_ has joined #zuul | 21:16 | |
*** jhesketh has quit IRC | 21:18 | |
SpamapS | why am I getting 88M console logs? | 21:18 |
SpamapS | (on success) | 21:19 |
*** dougbtv has quit IRC | 21:19 | |
SpamapS | http://zuulv3-dev.openstack.org/logs/80b98612c6af46eda3c2082ff122e423/ <-- for instance | 21:19 |
SpamapS | http://logs.openstack.org/93/463893/4/check/gate-zuul-python27-ubuntu-xenial/88c599a/ too | 21:20 |
clarkb | looks like its writing all the debug logs | 21:20 |
SpamapS | I wonder if something we merged borked it | 21:20 |
clarkb | I don't see the test name headers which is what you get when things fail so I think its just dumping stdout all the way through | 21:21 |
SpamapS | http://logs.openstack.org/65/449365/15/check/gate-zuul-python27-ubuntu-xenial/cb3d801/ | 21:22 |
SpamapS | clarkb: agreed | 21:23 |
SpamapS | just noticing that recently merged stuff has it too | 21:23 |
SpamapS | https://review.openstack.org/#/c/449365/ is the first one that has this | 21:24 |
jeblair | that's so unrelated it makes me wonder if something external changed? | 21:25 |
SpamapS | Yeah I am looking at that now | 21:26 |
clarkb | SpamapS: the parent of that change has it too | 21:27 |
SpamapS | Add support for github enterprise? | 21:28 |
SpamapS | I thought it was clean. Looking. | 21:28 |
clarkb | I'm like 5 or six changes back and they all have it too | 21:28 |
SpamapS | oh I didn't notice because they're compressed | 21:29 |
SpamapS | derp | 21:29 |
SpamapS | I5e312464b4f9a40f0ef00c00d12e7651e3890d4a doesn't have it | 21:31 |
clarkb | https://review.openstack.org/#/c/439834/ is first one to have it | 21:31 |
SpamapS | https://review.openstack.org/#/c/445644/ has it | 21:32 |
clarkb | https://review.openstack.org/#/c/439834/21/tests/base.py line 657 possibly at fault? | 21:33 |
SpamapS | hmmmmmm | 21:36 |
SpamapS | seems like there are merges _after_ that that don't have the fail | 21:36 |
SpamapS | oh | 21:36 |
SpamapS | derp-a-derp I've been looking at check results | 21:36 |
SpamapS | clarkb: that seems rather innocuous | 21:38 |
clarkb | SpamapS: its loading the logging at "compile" time not execute time when we set up the fixtures | 21:39 |
clarkb | which I think "wins" ? | 21:39 |
SpamapS | isn't that how we setup logs all over though? | 21:39 |
clarkb | hrm ya at least for scheduler | 21:39 |
SpamapS | it's a common pattern | 21:40 |
clarkb | I know it has caused problems in the past because its a compile time hit rather than runtime | 21:40 |
SpamapS | and I believe the fakelogger fixture takes over the output configs of logging | 21:40 |
clarkb | https://review.openstack.org/#/c/439834/21/tests/unit/test_github_driver.py that might actually be it | 21:42 |
clarkb | because its calling basicConfig | 21:42 |
SpamapS | oh yeah | 21:42 |
SpamapS | that's it | 21:42 |
SpamapS | clarkb: you want to submit a removal of that one? | 21:42 |
SpamapS | very naughty | 21:42 |
clarkb | SpamapS: I'm in the middle of smoking a brisket, so best if someone else gets to it | 21:43 |
SpamapS | clarkb: I'll get it, thanks! | 21:43 |
SpamapS | also can haz brisket? ;-) | 21:43 |
clarkb | I'm not sure how well it will survive transport to LA | 21:43 |
jlk | probably better than it'll survive being smoked | 21:44 |
jlk | Oh hahaha, this is my fault. SORRYYYY | 21:44 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Remove errant basicConfig in test_github_driver https://review.openstack.org/466112 | 21:46 |
mordred | jlk: well - I reviewed it and didn't catch that - so I'll share the SORRYYYY too | 21:46 |
jlk | I had no idea that nobody else was getting 90M of logs. | 21:47 |
SpamapS | errm.. no | 21:49 |
SpamapS | still printing a lot | 21:49 |
jeblair | tests/unit/test_multi_driver.py has a basicConfig | 21:50 |
SpamapS | yep just found that too | 21:50 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Remove errant basicConfig calls in test suites https://review.openstack.org/466112 | 21:50 |
mordred | and tests/unit/test_github_requirements.py | 21:50 |
SpamapS | I don't have that file | 21:51 |
jeblair | yeah, don't think that's landed yet | 21:52 |
mordred | ah - it's in "Adds github triggering from status updates" I15aef35716ddbcd1e66f84a73d27ca2689c936e4 | 21:52 |
mordred | I just happened to have my tree state with that one in there | 21:52 |
jeblair | mordred: speaking of which, jlk and i are embarking on http://paste.openstack.org/show/609964/ -- does that lgty? | 21:52 |
mordred | k. I left a comment on the patch that included the 3rd basicConfig | 21:53 |
SpamapS | 466112 seems to fix it | 21:53 |
SpamapS | suggest fast tracking it once it finishes tests | 21:54 |
mordred | jeblair: yes! sorry - I totally liked that earlier and didn't say anything | 21:54 |
jeblair | mordred: kk | 21:54 |
mordred | jeblair: we shoudl add a "like" button to etherpads :) | 21:54 |
SpamapS | Or perhaps I should finally give up on gnome terminal and use one that doesn't destroy a CPU trying to display such DEBUG deluges | 21:54 |
mordred | jeblair: (we should absolutel not add a like button to etherpads) | 21:55 |
* jeblair and 0 other people like this | 21:55 | |
jeblair | SpamapS: no this is good. i think i can safely say i, the infra team, and our log server thank you for finding this. :) | 21:56 |
pabelanger | reminds me, we should add rsync logic to playbooks for zuulv3 to no copy insane large logs folders | 21:57 |
pabelanger | would be a good POC | 21:57 |
SpamapS | pabelanger: as in, du the folder before hand, and if it's big, refuse? I like that. | 22:01 |
SpamapS | that might even relieve us of the need for an executor side du overwatch | 22:01 |
jlk | jeblair: I think we're going to wind up with some duplicated code between gerrit and github drivers at least, for pipelines, branches, refs, comments, emails, username_filters, timespecs, actions | 22:03 |
jlk | open, current_patchset | 22:03 |
SpamapS | daaaaaaaaaaaaaaamnit | 22:03 |
* SpamapS pep8's | 22:03 | |
jlk | but, can't guarantee all drivers will have that, so ¯\_(ツ)_/¯ | 22:04 |
jeblair | jlk: actually, i think just 'branches, refs, comments, emails, username'. but yeah. | 22:05 |
jlk | is pipelines covered more generically? | 22:05 |
jeblair | jlk: i think it's zuul trigger specific | 22:06 |
jlk | k | 22:06 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Remove errant basicConfig calls in test suites https://review.openstack.org/466112 | 22:06 |
jeblair | (that's another nice thing about this, the eventfilter is already quite crowded with trigger-specific stuff) | 22:06 |
SpamapS | hm | 22:17 |
SpamapS | getting random test fails now | 22:17 |
mordred | SpamapS: don't you love getting sucked down a rabbit hole? | 22:18 |
SpamapS | mordred: as long as there are cuddly rabbits at the bottom and not something awful like a worst cat or systemd. | 22:20 |
jlk | 2017-05-18 22:15:15.729107 | No handlers could be found for logger "paste.httpserver.ThreadPool" | 22:20 |
jlk | that's not an error tho | 22:20 |
SpamapS | that's been there a while | 22:24 |
SpamapS | I believe it happens because the paste server in the webapp tests add the logger _after_ fakelogger is created | 22:25 |
* SpamapS steps back from the rabbit hole for a few | 22:27 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add paste to the default list of loggers in tests https://review.openstack.org/466118 | 22:37 |
mordred | SpamapS: this ^^ might help that paste thingy | 22:37 |
jeblair | oops. i just ran the full test suite without applyng the log fixes. :) | 22:40 |
mordred | jeblair: :) | 22:41 |
mordred | jeblair: was the logging unsurprisingly still weird? | 22:41 |
jeblair | mordred: it sort of spilled all over my desktop and i have a big mess now | 22:43 |
* jlk hands jeblair a broom | 22:43 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP driver-specific filters https://review.openstack.org/466105 | 22:49 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP driver-specific filters https://review.openstack.org/466105 | 23:24 |
jeblair | jlk: ^ i think that's just about done. i'm still trying to figure out which tests are hanging though. | 23:25 |
jeblair | jlk: there's a bit more decomposition we can do on the TriggerEvent class, but everything that's left will take a little bit more thought, and i think most of those are settled enough for now (ie, i don't think we need to revisit what "patchset" means right at this moment, but we should later). | 23:27 |
mordred | jeblair: I'm eod'ing (there are some sausages in my future) but that little logging config patch of mine failed on two RPC errors that seem troubling. I'll look tomorrow when I get up to try to sort it - but I thought I'd call your attention to it in case it troubles you too | 23:46 |
mordred | https://review.openstack.org/#/c/466118/ | 23:46 |
jeblair | mordred: i think SpamapS's logging change *also* failed on rpc errors. that does trouble me. i worry about the gear release. | 23:48 |
mordred | jeblair: yah. that's why I was troubled | 23:48 |
SpamapS | :/ | 23:49 |
SpamapS | they seem non-deterministic to me | 23:49 |
jeblair | i'm almost done with this refactor; i'll switch to looking at that then | 23:50 |
mordred | jeblair, SpamapS : luckily that was the only substantive thing in that release | 23:50 |
mordred | so at least there aren't _multiple_ unrelated work areas to poke at | 23:51 |
SpamapS | hm, so http://paste.openstack.org/show/609973/ | 23:51 |
* mordred is rescannign through the release diff before EOD just to see if he missed anything | 23:51 | |
SpamapS | may legitimately be an API break that we didn't think through | 23:51 |
SpamapS | (weird that it didn't show up in testing before the log fix though..) | 23:52 |
mordred | SpamapS: ++ | 23:52 |
mordred | SpamapS: easy way to check - cast that pipeline name to non-unicode and see if the fail goes away ... | 23:52 |
Shrews | oh good, i'm not the only one seeing those two RPC fails | 23:53 |
mordred | Shrews: nope. we're all seeing them | 23:53 |
Shrews | "I see RPC fails" | 23:53 |
SpamapS | mordred: yeah that's what I'm doing.. trying to simultaneously trace back to see where it's being coerced to unicode | 23:53 |
mordred | SpamapS: I would not be surprised if we accidentally double-encode somewhere | 23:54 |
SpamapS | python doesn't really let you do that the bad way if you're using .encode() and .decode() | 23:55 |
mordred | well - let me say - I would not be surprised by any one of a number of things like that - including double-encoding or legit api break we missed or accidentally using ebcidic instead of ascii | 23:55 |
SpamapS | pipeline = tenant.layout.pipelines[event.pipeline_name.encode('utf8')] | 23:56 |
SpamapS | KeyError: 'nonexistent' | 23:56 |
SpamapS | so I think layout.pipelines is empty | 23:56 |
SpamapS | OrderedDict([('check', <Pipeline check>), ('gate', <Pipeline gate>), ('post', <Pipeline post>)]) | 23:57 |
SpamapS | It's not supposed to be there. ;-) | 23:57 |
SpamapS | duh | 23:57 |
SpamapS | 'nonexistent | 23:57 |
mordred | SpamapS: so - that keyerror should be manifesting as an RPCException but isn't | 23:58 |
SpamapS | yep | 23:58 |
SpamapS | gah.. EOD is at hand | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!