jeblair | SpamapS: let me grab a flashlight... | 00:04 |
---|---|---|
SpamapS | jeblair: "I'll submit what I have now, as it's failing reliably somewhat deep into the test | 00:04 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Refactor test_zuul_refs https://review.openstack.org/397994 | 00:06 |
SpamapS | jeblair: note that ^^ has something I'm not sure is right. There are 10 builds, but I'm only checking the first 4 that match the ZUUL_CHANGE that I want.. given the nature of the test, I thought this was probably right.. but not sure. | 00:08 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable TestNodepool.test_node https://review.openstack.org/397944 | 00:12 |
pabelanger | jeblair: Shrews: okay, I think that is a little more cleaner^ | 00:12 |
pabelanger | I'll hold off on any more tests until that one is reviews, since it add zookeeper into nodepool.py | 00:13 |
jeblair | SpamapS: yeah, those changes should trigger more than 4 jobs (10 seems strange, i wonder why it's not 12?) and we're presuming they're all getting the same information, so we only check one build from each. | 00:16 |
SpamapS | jeblair: it's possible I observed the length of self.builds mid-launch | 00:17 |
jeblair | apparently there will be 14 builds for that by the time it's done | 00:18 |
SpamapS | (Pdb) p self.builds | 00:19 |
SpamapS | [<FakeBuild project-test1 3,1 [waiting]>, <FakeBuild project-test2 3,1 [waiting]>, <FakeBuild project1-project2-integration 3,1 [waiting]>, <FakeBuild project-test1 3,1 4,1 [waiting]>, <FakeBuild project-test2 3,1 4,1 [waiting]>, <FakeBuild project1-project2-integration 3,1 4,1 [waiting]>, <FakeBuild project-test1 3,1 4,1 5,1 [waiting]>, <FakeBuild project-test2 3,1 4,1 5,1 [waiting]>, <FakeBuild project-test1 | 00:19 |
SpamapS | 3,1 4,1 5,1 6,1 [waiting]>, <FakeBuild project-test2 3,1 4,1 5,1 6,1 [waiting]>] | 00:19 |
SpamapS | y | 00:19 |
jeblair | oh right those are running builds | 00:19 |
jeblair | so that's correct | 00:20 |
jeblair | the 4 -merge builds have completed (so they are accessible in self.history and are removed from self.builds) | 00:20 |
jeblair | since this is the last message from hasChanges (and seems to match the call in 2116), i assume this is the error: | 00:23 |
jeblair | 2016-11-16 00:08:26,033 zuul.test DEBUG Checking if build <FakeBuild project-test2 3,1 4,1 5,1 [waiting]> has changes; commit_messages ['A-1']; repo_messages [u"Merge commit 'refs/changes/1/5/1' of /tmp/tmp.PQac7IgBC1/tmp37U9tW/zuul-test/upstream/org/project2 into HEAD", u'M2-1', u'C-1', u'add content from fixture', u'initial commit'] | 00:23 |
SpamapS | Right, ZUUL_CHANGE == 5 doesn't seem to be the right change anymore | 00:25 |
jeblair | it's looking for 'A-1' in that list of commit message, but not seeing it. it sees C-1, but the A and B commits are not there | 00:25 |
SpamapS | I'm still learning the fixtures, so I can't even tell how ZUUL_CHANGE gets created yet | 00:25 |
SpamapS | can't even find where self.fake_gerrit is created | 00:26 |
jeblair | SpamapS: ZUUL_CHANGE is essentially the gerrit change id, it starts at 1 and monotonically increases each time we addFakeChange in the test | 00:26 |
jeblair | so in that test, M1=1, M2=2, A=3, B=4, C=5, D=6 | 00:26 |
SpamapS | so guessing there are merger things stubbed out or just broken right now | 00:28 |
jeblair | probably broken; this shouldn't be hitting stub code | 00:29 |
SpamapS | cool, worth digging deeper then. :) | 00:31 |
* SpamapS packs some cliff bars and an extra 200m of rope. | 00:31 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove image building responsibility from nodepool.py https://review.openstack.org/397965 | 00:33 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable TestNodepool.test_node https://review.openstack.org/397944 | 00:33 |
pabelanger | jhesketh: do you mind reviewing: https://review.openstack.org/#/q/status:open+project:openstack-infra/zuul+branch:feature/zuulv3+topic:enable-tests today? | 00:36 |
jhesketh | of course | 00:37 |
* jhesketh jumps on it now before he forgots | 00:38 | |
jhesketh | *forgets | 00:38 |
jhesketh | pabelanger: awesome stuff with taking all these on btw :-) | 00:38 |
jeblair | SpamapS: i think i see the problem -- | 00:38 |
jeblair | SpamapS: that's the correct git history for project2 (the project that change 5/C is in). but we're actually looking for change A which is in project1 | 00:40 |
pabelanger | jhesketh: not an issue, happy to help where I can | 00:41 |
jhesketh | jeblair: any particular reason you didn't +3 some of pabelanger's tests as you mentioned for me yesterday? | 00:41 |
jhesketh | :-) | 00:41 |
jeblair | SpamapS: ref_has_change looks at the git repo for the project corresponding to the change you're asking for, but build.hasChanges only looks at ZUUL_PROJECT. it *probably* needs to similarly check the project corresponding to each change you're asking about | 00:42 |
jeblair | jhesketh: i +3d the ones you reviewed. i +2d the rest and left them for you to review? | 00:43 |
jhesketh | jeblair: sure... maybe I misunderstood yesterday, but did you want me to +3 'trivial' | 00:44 |
jhesketh | patches with no other +2's | 00:44 |
jhesketh | on the v3 branch | 00:44 |
jeblair | are there any? | 00:44 |
jhesketh | not sure, just working through them at the moment.. .but in general when reviewing for v3 | 00:44 |
jhesketh | and in particular for the test-enable ones | 00:45 |
jeblair | let's not -- i've -1d a couple of trivial ones that turn out not to be trivial | 00:45 |
SpamapS | jeblair: ah yes, test fixture issue. I wasn't finding much in the other end. | 00:45 |
jhesketh | okay, sure | 00:45 |
jhesketh | my apologies for misunderstanding yesterday then | 00:45 |
jeblair | jhesketh: np -- your reviews on them are appreciated so i want to make sure both of us see as many as possible :) | 00:46 |
jhesketh | ack | 00:46 |
jeblair | pabelanger: 944 lgtm, but i wonder if we're at a point where we should just change node.yaml to be like node_dib.yaml -- that might result in less test churn? | 00:49 |
jeblair | (specifically, i'm thinking if we don't have to change all occurences of 'fake-image' to 'fake-dib-image' it would be nice) | 00:50 |
pabelanger | jeblair: ya, I was thinking about that too. I think those files can be merged now | 00:51 |
jeblair | cool, worth a shot i think | 00:51 |
pabelanger | should I do that in 944 or the following patch? | 00:52 |
jeblair | pabelanger: i think either would be fine (i don't mind a following patch changing that back) | 00:53 |
pabelanger | okay, I can do that first thing in the morning | 00:53 |
openstackgerrit | Merged openstack-infra/zuul: Re-enable TestMergerRepo() class for testing https://review.openstack.org/397189 | 00:57 |
openstackgerrit | Merged openstack-infra/zuul: Re-enable test_crd_gate / test_crd_multiline / test_crd_gate_reverse https://review.openstack.org/397277 | 00:58 |
openstackgerrit | Merged openstack-infra/zuul: Re-enable test_crd_check job https://review.openstack.org/397317 | 00:58 |
openstackgerrit | Merged openstack-infra/zuul: Re-enable test_crd_branch test https://review.openstack.org/397337 | 00:59 |
openstackgerrit | Merged openstack-infra/zuul: Re-enable test_crd_cycle_join test https://review.openstack.org/397340 | 00:59 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Refactor test_zuul_refs and FakeBuild.hasChanges https://review.openstack.org/397994 | 01:03 |
SpamapS | jeblair: good call^ | 01:04 |
SpamapS | test passes now. Did not test to see if there was fallout from the changes... have to run out to fetch kids now | 01:04 |
jeblair | SpamapS: cool, the rest of the tests will tell us that :) | 01:05 |
*** harlowja has quit IRC | 05:55 | |
*** bhavik1 has joined #zuul | 07:01 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Catch all upload exceptions https://review.openstack.org/398148 | 07:17 |
*** abregman has joined #zuul | 07:22 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Catch all upload exceptions https://review.openstack.org/398148 | 07:49 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Refactor test_zuul_refs and FakeBuild.hasChanges https://review.openstack.org/397994 | 08:18 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Refactor test_zuul_refs and FakeBuild.hasChanges https://review.openstack.org/397994 | 08:20 |
*** saneax-_-|AFK is now known as saneax | 08:36 | |
*** hashar has joined #zuul | 08:43 | |
*** saneax is now known as saneax-_-|AFK | 08:55 | |
*** saneax-_-|AFK is now known as saneax | 09:00 | |
*** saneax is now known as saneax-_-|AFK | 09:13 | |
*** saneax-_-|AFK is now known as saneax | 09:38 | |
*** saneax is now known as saneax-_-|AFK | 09:59 | |
*** bhavik1 has quit IRC | 10:50 | |
*** abregman is now known as abregman|afk | 10:51 | |
*** abregman|afk is now known as abregman | 11:29 | |
*** _ari_ has quit IRC | 12:34 | |
*** _ari_ has joined #zuul | 12:36 | |
*** jamielennox is now known as jamielennox|away | 13:25 | |
*** abregman is now known as abregman|mtg | 13:32 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Replace node_dib.yaml with node.yaml for tests https://review.openstack.org/398357 | 13:40 |
*** abregman_ has joined #zuul | 13:56 | |
*** abregman|mtg has quit IRC | 13:59 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Replace node_dib.yaml with node.yaml for tests https://review.openstack.org/398357 | 13:59 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable TestNodepool.test_node https://review.openstack.org/397944 | 13:59 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enalbe test_node_net_name test https://review.openstack.org/398368 | 14:03 |
*** abregman_ has quit IRC | 14:04 | |
*** abregman has joined #zuul | 14:05 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enable test_node_vhd_image test https://review.openstack.org/398369 | 14:10 |
pabelanger | mordred: do you mind helping +3 jeblair stack of changes for nodepool? https://review.openstack.org/#/c/397952 Would save me a rebase | 14:12 |
*** Cibo has joined #zuul | 14:18 | |
pabelanger | actually, I can rebase for now | 14:22 |
mordred | pabelanger: looking | 14:25 |
mordred | pabelanger: done | 14:26 |
pabelanger | mordred: danke | 14:28 |
pabelanger | Shrews: going to take a stab at getting devstack job updated with zookeeper | 14:29 |
openstackgerrit | Merged openstack-infra/nodepool: Fix race condition in build cleanup https://review.openstack.org/397975 | 14:30 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enabled zookeeper for devstack jobs https://review.openstack.org/398379 | 14:31 |
mordred | pabelanger: that looks very straightforward ^^ :) | 14:33 |
openstackgerrit | Merged openstack-infra/nodepool: Remove Zookeeper per-test fixture https://review.openstack.org/397948 | 14:33 |
openstackgerrit | Merged openstack-infra/nodepool: Reduce kazoo logging in tests https://review.openstack.org/397952 | 14:33 |
openstackgerrit | Merged openstack-infra/nodepool: Remove discover from test-requirements https://review.openstack.org/345801 | 14:33 |
pabelanger | mordred: ya, untested, lets see what the job does | 14:33 |
pabelanger | sadly no centos package yet | 14:33 |
openstackgerrit | Merged openstack-infra/nodepool: Remove image building responsibility from nodepool.py https://review.openstack.org/397965 | 14:33 |
*** abregman has quit IRC | 14:54 | |
*** abregman has joined #zuul | 15:16 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enabled zookeeper for devstack jobs https://review.openstack.org/398379 | 15:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Replace node_dib.yaml with node.yaml for tests https://review.openstack.org/398357 | 15:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enable test_node_vhd_image test https://review.openstack.org/398369 | 15:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable TestNodepool.test_node https://review.openstack.org/397944 | 15:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_node_net_name test https://review.openstack.org/398368 | 15:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_node_vhd_and_qcow2 test https://review.openstack.org/398416 | 15:31 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_dib_image_delete https://review.openstack.org/398418 | 15:41 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_dib_upload_fail test https://review.openstack.org/398426 | 15:45 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_config_validate https://review.openstack.org/398427 | 15:45 |
*** abregman has quit IRC | 15:55 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_build https://review.openstack.org/398437 | 15:57 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_subnodes test https://review.openstack.org/398438 | 15:59 |
Shrews | pabelanger: what does the job-create nodepool command supposed to do? | 15:59 |
Shrews | yes, i know, create a job... thanks mordred :-P | 15:59 |
Shrews | but, what is the job? | 15:59 |
pabelanger | Shrews: we use that to setup auto-holds in nodepool | 16:00 |
pabelanger | so, when a job files, nodepool will not delete the node, but flag it as hold | 16:00 |
*** abregman has joined #zuul | 16:01 | |
Shrews | hrm, that seems very un-zookeeper related | 16:01 |
* Shrews tries re-enabling | 16:01 | |
pabelanger | agreed | 16:01 |
mordred | Shrews: :) | 16:01 |
Shrews | cool, test_job_create and test_job_delete still work | 16:02 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete https://review.openstack.org/398440 | 16:02 |
pabelanger | Yay | 16:03 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enable test_node_az test https://review.openstack.org/398442 | 16:05 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Remove test: test_snapshot_image_update https://review.openstack.org/398445 | 16:11 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_node_ipv6 test https://review.openstack.org/398446 | 16:12 |
* Shrews wonders how much of the infra compute resources is being used by pabelanger and Shrews | 16:13 | |
pabelanger | :) | 16:13 |
pabelanger | 698 nodes in use right now | 16:13 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_node_delete_success test https://review.openstack.org/398448 | 16:14 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_node_delete_failure test https://review.openstack.org/398450 | 16:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_db test https://review.openstack.org/398451 | 16:18 |
pabelanger | I don't want to jinx this, but it is going pretty smoothly the re-enabling of tests for test_nodepool.py | 16:18 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_leaked_node test https://review.openstack.org/398458 | 16:21 |
jeblair | Shrews: https://review.openstack.org/397944 is an important one i want to make sure you review | 16:22 |
Shrews | pabelanger: comment for you on 397944 | 16:24 |
pabelanger | checking | 16:24 |
Shrews | jeblair: yeah, just looked at it | 16:24 |
pabelanger | Shrews: sure, I can change that if not correct | 16:25 |
jeblair | Shrews: also be aware of https://review.openstack.org/398357 which may affect test re-enablement | 16:25 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Use the name parameter from Task instead of class name https://review.openstack.org/398461 | 16:25 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Remove need for findNetwork https://review.openstack.org/398462 | 16:25 |
jeblair | Shrews, pabelanger: sounds like a good follow-up patch | 16:26 |
pabelanger | ack | 16:26 |
Shrews | pabelanger: yeah, let's not rebase the entire chain | 16:26 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_job_start_event test https://review.openstack.org/398463 | 16:26 |
mordred | Shrews, pabelanger: http://logs.openstack.org/79/398379/2/check/gate-dsvm-nodepool/5acebe8/logs/screen-nodepool-builder.txt.gz#_2016-11-16_15_33_44_884 | 16:26 |
mordred | why is there an "osic-cloud1" in the image string? I can't find ... oh wait a sec | 16:27 |
* jeblair watches firefox thrash | 16:27 | |
Shrews | mordred: File Not Found | 16:27 |
Shrews | fun | 16:27 |
mordred | ok - I will ask the question for real now - I don't see osic anywhere in the repo | 16:28 |
Shrews | oh, found now | 16:28 |
pabelanger | it was run on osic | 16:29 |
jeblair | well, i still can't see that file -- but the builder does store the hostname as a field in zk to indicate what host it was built on... but i don't know why it would be in the image name | 16:29 |
jeblair | mordred: oh! | 16:29 |
jeblair | mordred: still just guessing here -- but i think the devstack test does some awk/grep stuff on nodepool command output. i bet we need to update the field order | 16:29 |
Shrews | yeah, the builder hostname is definitely NOT used in the image name | 16:30 |
jeblair | like, it's looking for an image name, but it's pulling the build host out of the output table instead of the image name | 16:30 |
pabelanger | Still downloading log file :/ | 16:30 |
mordred | I'm dumb. yah. that's the hostname | 16:30 |
mordred | ubuntu-trusty-osic-cloud1-disk-5402390 | 16:30 |
clarkb | also figuring out why that logfile is so large now wouldbe good | 16:31 |
mordred | that's a kazooclient log message - I think it' sjust saying it's talking to the zookeeper on ubuntu-trusty-osic-cloud1-disk-5402390 | 16:31 |
pabelanger | kazoo.client I suspect | 16:31 |
pabelanger | I'll update the logging.conf file to INFO for it | 16:31 |
jeblair | pabelanger: ++thx | 16:31 |
clarkb | 12mbcompressed | 16:31 |
* jeblair kills ff | 16:32 | |
jeblair | SpamapS, mordred: the only other thought i have on 397994 is that we aren't really going to use zuul_ref for its intended purpose anymore (which is to be the ref that zuul-cloner uses to fetch things from the merger -- all of that is gone in v3). we could start removing it now, however, there's a possibility we might want to keep it, or something like it, to uniquely identify build configurations (though we would probably call it something ... | 16:39 |
jeblair | ... else). (it's one of the things we can use to allow jobs in the same build set to stash and fetch artifacts with a shared key, for instance). so i'm inclined to say maybe we leave that for the moment and see if we think we need that later. we could log this as a backlog item: "remove or update zuul_ref env variable". | 16:39 |
mordred | jeblair: I agree with keep it for now | 16:40 |
*** abregman has quit IRC | 16:40 | |
SpamapS | jeblair: yeah I had wondered about that too. Still ramping up in v3's changes so I wasn't sure enough to say. | 16:41 |
SpamapS | jeblair: other than inform zuul-cloner of how to find merged things, ZUUL_REF doesn't really do anything, right? | 16:41 |
SpamapS | the way I worded that was sort of like Homer haggling with Professor Frink... "Well it _ONLY_ transports _MATTER_" | 16:42 |
openstackgerrit | Merged openstack-infra/zuul: Refactor test_zuul_refs and FakeBuild.hasChanges https://review.openstack.org/397994 | 16:42 |
jeblair | SpamapS: yep :) | 16:43 |
* SpamapS updates workboard | 16:44 | |
timrc | Storyboard fail: | 16:45 |
clarkb | jeblair: will there not be a ref I can fetch locally to reproduce a job on my laptop? | 16:45 |
timrc | "400: GET /api/v1/users/preferences: Invalid input for field/attribute user_id. Value: 'preferences'. unable to convert to int " | 16:45 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enabled zookeeper for devstack jobs https://review.openstack.org/398379 | 16:46 |
SotK | timrc: Sorry about that, its a known bug that happens when you've not accessed storyboard for a while. Refreshing the page should fix it. | 16:47 |
timrc | SpamapS: There's a few worklist/boards with the word "zuul" in them but the one I should be paying attention to is: https://storyboard.openstack.org/#!/board/41? | 16:47 |
timrc | SotK: Ah, should have tried that first. I just logged out and back in. | 16:47 |
jeblair | clarkb: nope :( parts of reproducing get easier (ansible!), but parts are harder (we push rather than pull git repos). so i think we'll have to come up with something to fill that gap. | 16:48 |
jeblair | clarkb: perhaps the merger can record what it does, and we can somehow include that (as a series of git commands?) in the build artifacts | 16:48 |
clarkb | jeblair: I guess I had assumed (incorrectly) that pushing wouldnt remove ref hosting | 16:48 |
clarkb | and instead just be an optimization for jobs running | 16:49 |
clarkb | so make refs as today, and then also push them | 16:49 |
pabelanger | would it be much work to support both? | 16:50 |
jeblair | clarkb: we *could* implement it that way, but the way i did it has some advantages: first, the merger operations can happen in parallel, since each build gets its own copies of the git repos. second, we don't have zuul refs to clean up. (currently, we need to delete all our zuul refs every month or so to get reasonable performance from the mergers). third, we don't have to tell users to run an external facing service (apache). | 16:52 |
clarkb | I think the first thing could still happen if refs were no longer unique per buildset (can still fetch a ref for particular jobs to reproduce them) | 16:54 |
jeblair | (keep in mind that in many cases, the "reproduce a job" use case is actually "i already have repos on my workstation, i want to run the job". that gets *easier* with this setup. clarkb's example is different, but still valid, and we should support it, of course.) | 16:54 |
SpamapS | timrc: yes, 41 | 16:54 |
jeblair | clarkb: i'm not sure i follow | 16:55 |
clarkb | jeblair: make a zuul ref for each job and do them in parallel | 16:55 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_job_end_event test https://review.openstack.org/398493 | 16:56 |
clarkb | the other two issues are harder to solve because git | 16:56 |
jeblair | clarkb: oh, sorry, the reason they can be parallel is that they are literally happening in different git trees. the merge operation dirties the working tree, so you can only do it one at a time. current v3 implementation clones all necessary git repos into the jobdir and performs the speculative merges there. | 16:56 |
mordred | I like the idea of recording the merge operations in some manner - and having a thing someone can run that will consume that output and produce the same git repo state | 16:57 |
SpamapS | well... | 16:57 |
mordred | it seems like a thing that would be consumable in a thing like reproduce.sh - or as a standalone | 16:57 |
SpamapS | you could just have it push things to some archival "all the merges recently done" spot | 16:57 |
jeblair | SpamapS: we did that 3 years ago for several months and killed gerrit :( | 16:57 |
SpamapS | like, optionally, as a thing that is used for shops that want to be able to grab them. | 16:57 |
clarkb | jeblair: I see so you'd need separate hosting of every job dir | 16:58 |
jeblair | SpamapS: so yes, possible, but a significant system administration overhead | 16:58 |
mordred | jeblair: yah - it might have to be a separate thing that's only used for keeping recent historical merges | 16:58 |
SpamapS | jeblair: I was just thinking a dumb dir somewhere | 16:58 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_job_auto_hold_success / test_job_auto_hold_failure https://review.openstack.org/398495 | 16:58 |
mordred | rather than pushing back to gerrit | 16:58 |
mordred | or the existing git farm | 16:58 |
SpamapS | something that has an hourly cron that removes everything > 24 hours old | 16:58 |
jeblair | mordred: sure, i'm just saying that kind of git repo management is *hard*. it is *harder than we are willing to do*. | 16:58 |
mordred | totes | 16:59 |
jeblair | i would not want to inflict that on our users :) | 16:59 |
jeblair | let me suggest this: | 16:59 |
SpamapS | Seems like a decent optional add-on to support if shops find they aren't comfortable with pseudo-reproductions. | 16:59 |
jeblair | if a user wants that feature, they can write a role (and put in in their pre-run playbook) that pushes the job's repo state to their big git history machine | 16:59 |
jeblair | SpamapS: ^ ya | 16:59 |
persia | SpamapS: Rather than >24 hours, could it remove speculative merges known unreliable, speculative merges already accomplished, and speculative merges superceded? The problems with time include holidays, timezones, etc. | 16:59 |
SpamapS | I personally have never pulled a ref to reproduce something... I usually just git review -d. | 16:59 |
SpamapS | and then rebase | 17:00 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_job_auto_hold_* tests https://review.openstack.org/398495 | 17:00 |
mordred | SpamapS: same here | 17:00 |
clarkb | I pull all the time... | 17:00 |
clarkb | but I deal with "I camt reproduce this locally" as a job hazard | 17:00 |
mordred | jeblair: but I agree - an ansible role that can | 17:00 |
jeblair | mordred, clarkb, SpamapS: meanwhile, we can also do the 'record what the merger does' idea and stuff it in some ansible variables so that something like reproduce.sh can output the git commands | 17:00 |
mordred | gah | 17:00 |
SpamapS | jeblair: I like the idea of a standard library of things that can be pulled in according to needs. | 17:00 |
jeblair | SpamapS: yep | 17:01 |
mordred | SpamapS: ++ | 17:01 |
jeblair | we can even write that role for folks, which means we can put a nice warning label on it. :) | 17:01 |
SpamapS | Please somebody write this down in a story. | 17:01 |
clarkb | the use case is also met if there is a simple way to produce a ref locally | 17:02 |
SpamapS | It sounds like something that may end up being a regression for clarkb | 17:02 |
jeblair | yes, i do not think we should run the big git machine for storing refs in infra. i think for clarkb's case, we should do the 'output git commands' thing. | 17:02 |
mordred | and then see how angry that makes clarkb | 17:02 |
mordred | and fix if the anger level exceeds the anger level that etherpad produces :) | 17:03 |
clarkb | ugh dont remind me | 17:03 |
clarkb | npm ehy?! | 17:03 |
* mordred hands clarkb a nice fluffy bunny | 17:03 | |
SpamapS | Somebody write a paper to define the etherpad anger scale. | 17:03 |
jeblair | SpamapS: i will storify. | 17:04 |
jeblair | but would have loved it if someone else volunteered. | 17:04 |
SpamapS | 0 being "I don't have to run etherpad". 10 being "I run etherpad for OpenStack summits" | 17:04 |
timrc | I think SpamapS inadvertently highlighted a potential issue... thinking of workflow in terms of git review. If zuulv3 is going to cater to things like GH, we shoul be thinking about how developers on that platform work with changes. | 17:05 |
jeblair | timrc: for this purpose, there is no difference. git commits are fetched regardless. they are called different things. | 17:07 |
SpamapS | timrc: that's a good point. There's no git review -d there. But there should be sha's that one can recover from GH. | 17:07 |
SpamapS | I think timrc's point is that there's no record of the sha's in PR's | 17:07 |
jeblair | to be clear, there is no 'git review -d' in infra's zuul either. | 17:07 |
SpamapS | eh, gerrit? | 17:08 |
jeblair | we don't use tha sha's for gerrit | 17:08 |
jeblair | zuul fetches named refs from gerrit to test changes. it will also fetch named refs from github. | 17:08 |
mordred | yah - pulls have named refs associated with them | 17:08 |
SpamapS | I think users might have a hard time determining the named refs. | 17:09 |
SpamapS | Since PR sources get push -f'd over | 17:09 |
mordred | git fetch origin pull/{ID}/head | 17:09 |
SpamapS | that {ID}, is that the PR #? | 17:09 |
mordred | yes | 17:09 |
jeblair | yeah, in gerrit they are named refs/changes/34/1234. in github they are named /pull/1234/head | 17:10 |
SpamapS | so you're screweed when somebody has pushed over it. | 17:10 |
mordred | yup | 17:10 |
SpamapS | but that sha is still there | 17:10 |
mordred | but that's just one of teh limitations of github | 17:10 |
SpamapS | and could be recovered | 17:10 |
mordred | github actually garbage collects fairly aggressively | 17:10 |
mordred | so it might not be there | 17:10 |
SpamapS | of course it does | 17:10 |
SpamapS | yay GH | 17:10 |
timrc | Well we'll ahve a reflog in the cloner right :)? | 17:10 |
jeblair | i do not want to try to improve github. we should use/expose the service they provide | 17:10 |
mordred | yah | 17:11 |
timrc | jeblair: No no fair point, it just got me thinking. | 17:11 |
mordred | ++ | 17:11 |
Zara | (+1 from this lurker) | 17:11 |
SotK | +1 from this one too | 17:11 |
jeblair | SpamapS: most of the time people add commits to prs though. that has a correspondence with gerrit patchsets and maintains reproducibility. | 17:13 |
jeblair | well, i'll say "many times". i do not know if it is most of the time. | 17:13 |
clarkb | jeblair: a lot of projects enforce a rebase and squash workflow to keep things "clean" (which is somewhat funny bceause thats like the biggest reasion people hate gerrit except that gerrit actually does preserve all that unclean history too whereas github doesn't) | 17:14 |
jeblair | clarkb: thus my self-correction. | 17:14 |
jeblair | at any rate, we'll be as reproducible as the underlying system | 17:15 |
jeblair | and we can record the shas in our merger-recorder so that if something changes, it is detectable. | 17:16 |
pabelanger | Shrews: any ideas? I am trying to use zk.deleteBuild() but getting an kazoo.exception: http://paste.openstack.org/show/589484/ | 17:16 |
pabelanger | maybe I am not using the right build_number | 17:17 |
pabelanger | but I don't think so | 17:17 |
SpamapS | https://www.dropbox.com/s/d3bhhwcmevgwj1n/GerritVSGithub.jpg?dl=0 | 17:19 |
* SpamapS felt a need to express how he feels about these two things | 17:19 | |
pabelanger | Shrews: ah, but using self.client.delete(path, recursive=True) for deleteBuild() the exception goes away | 17:19 |
jeblair | SpamapS, clarkb, mordred: https://storyboard.openstack.org/#!/story/2000801 those tasks have notes -- click the arrow to see them | 17:21 |
SpamapS | jeblair: I think a gating workflow will cure people of their "add commits, squash later" behavior relatively quickly. ;) | 17:21 |
jeblair | SpamapS: :) | 17:21 |
Zara | SpamapS: :D | 17:21 |
* SotK hopes so | 17:22 | |
jeblair | pabelanger: are you working on the same thing as in https://review.openstack.org/398418 ? | 17:23 |
pabelanger | jeblair: I am but for test_nodepool | 17:24 |
pabelanger | I think I did the same thing | 17:24 |
pabelanger | let me push up the patch | 17:24 |
jeblair | pabelanger, Shrews: see comment in https://review.openstack.org/398418 | 17:25 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_image_delete test https://review.openstack.org/398502 | 17:25 |
pabelanger | jeblair: Shrews: okay, ^ could use some eyes. That's the first test I've had to rewrite to not use the database | 17:26 |
pabelanger | I am a little concerned we are actually testing the image was deleted | 17:26 |
jeblair | pabelanger: i wonder if that test should go away | 17:26 |
jeblair | pabelanger: i suspect it has been superceded by a test in test_builder (where that functionality actually resides now) | 17:27 |
pabelanger | sure, let me check it quickly | 17:27 |
jeblair | pabelanger: cool thx | 17:27 |
clarkb | pabelanger: I too have comments on https://review.openstack.org/#/c/397944/6 happy to have them be follow up items, but wanted you to see them both before I approve that change | 17:27 |
clarkb | (and I intend on approving the change just as soon as we can have a quick glance and agree that nothing needs to be fixed in that patchset) | 17:27 |
pabelanger | jeblair: looks some are still disabled in test_builder, but I don't see one specific to deleting an image. I can remove it from test_nodepool now, but we'll likely need to increase code coverage for test_builder | 17:30 |
clarkb | jeblair: pabelanger I agree that that test should live in test_builder.py now | 17:31 |
clarkb | so if there isn't such a test maybe we move it as part of removing it from test_nodepool.py? | 17:31 |
pabelanger | clarkb: re: 397944. I think you are correct, we shouldn't only compare the 0th element | 17:35 |
clarkb | pabelanger: but its ok to fix in a followup because our zk config lists only have a single elemtn currently right? | 17:36 |
clarkb | pabelanger: or does it need to be addressed in that base change? | 17:36 |
pabelanger | clarkb: today, we only have a single element | 17:36 |
pabelanger | I'll do a follow up then | 17:37 |
clarkb | kk | 17:37 |
*** hashar has quit IRC | 17:38 | |
pabelanger | clarkb: jeblair: So, I think we can delete or move the remaining tests in test_nodepool. test_building_image_cleanup_on_start / test_handle_dib_build_gear_disconnect | 17:38 |
pabelanger | gear for sure can be deleted | 17:38 |
pabelanger | clean up maybe into test_builder | 17:38 |
jeblair | pabelanger: the cleanup may not be necessary either -- there's a janitor thread running that should delete any failed image builds, so we don't need to do that on start any more. | 17:42 |
clarkb | though having a test that things do get cleaned up by that thread is probably a good idea | 17:42 |
clarkb | so would change to build enough images that one gets superceded, wait 2*cleanup interval, check image is gone | 17:42 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable TestNodepool.test_node https://review.openstack.org/397944 | 17:43 |
openstackgerrit | Merged openstack-infra/nodepool: Replace node_dib.yaml with node.yaml for tests https://review.openstack.org/398357 | 17:43 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_node_net_name test https://review.openstack.org/398368 | 17:43 |
clarkb | pabelanger: ^ I am approving things, followup to fix the zk startup would be great | 17:43 |
pabelanger | clarkb: sure, I can do that now | 17:43 |
pabelanger | Hmm, looks like a race: http://logs.openstack.org/46/398446/1/check/gate-nodepool-python27-db-ubuntu-xenial/b9bb5c5/console.html | 17:44 |
pabelanger | need to dig into that | 17:44 |
jeblair | clarkb: yes, though an analog to this case would be "start builder, kill builder during image build, start builder and verify previous 'building' image was deleted" | 17:44 |
clarkb | oh right the state there is what is important | 17:45 |
clarkb | jeblair: pabelanger also while people are reviewing nodepool things https://review.openstack.org/#/c/370455/ would be nice to fix on master | 17:46 |
jeblair | aprvd | 17:47 |
clarkb | ty | 17:47 |
clarkb | (especially since we have to live with subnodes for a little while longer aiui) | 17:47 |
openstackgerrit | Merged openstack-infra/nodepool: Enable test_node_vhd_image test https://review.openstack.org/398369 | 17:48 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_node_vhd_and_qcow2 test https://review.openstack.org/398416 | 17:48 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_dib_upload_fail test https://review.openstack.org/398426 | 17:49 |
clarkb | pabelanger: http://logs.openstack.org/46/398446/1/check/gate-nodepool-python27-db-ubuntu-xenial/b9bb5c5/console.html maybe a race in the new enabled jobs? I think I will hold off on approving things until we can sort that out | 17:50 |
clarkb | *more things | 17:50 |
pabelanger | clarkb: agreed, trying to figure out why now | 17:51 |
openstackgerrit | Merged openstack-infra/nodepool: Fix subnode deletion https://review.openstack.org/370455 | 17:51 |
pabelanger | clarkb: Shrews: It looks like we are upload the same image multiple times: http://logs.openstack.org/46/398446/1/check/gate-nodepool-python27-db-ubuntu-xenial/b9bb5c5/console.html#_2016-11-16_16_18_45_220106 | 17:52 |
clarkb | pabelanger: yes thats the image that fails to upload | 17:53 |
pabelanger | Ah | 17:53 |
clarkb | pabelanger: so the builder retries it, what should happen is the other image is uploaded successfully and nodepool boots nodes in that provider instead | 17:53 |
clarkb | pabelanger: basically this is testing that if one provider can't get images that we successfully use other providers that can | 17:53 |
pabelanger | looks like we don't attempt the upload of the fake-provider2 | 17:54 |
clarkb | pabelanger: interesting, but that doesn't happen always because https://review.openstack.org/398426 passed testing enough times to merge | 17:55 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_subnodes test https://review.openstack.org/398438 | 17:57 |
clarkb | pabelanger: but I agree http://logs.openstack.org/46/398446/1/check/gate-nodepool-python27-db-ubuntu-xenial/b9bb5c5/console.html#_2016-11-16_16_18_45_281759 | 17:57 |
openstackgerrit | Merged openstack-infra/nodepool: Enable test_node_az test https://review.openstack.org/398442 | 17:57 |
clarkb | pabelanger: shows that we never get the second image which should succeed there | 17:57 |
pabelanger | Ya | 17:58 |
pabelanger | think we have an issue in our imageupload loop | 17:58 |
clarkb | ya this looks like a real bug reading the logs | 17:58 |
pabelanger | interesting | 17:58 |
pabelanger | http://paste.openstack.org/show/589491/ | 17:59 |
pabelanger | some lock exceptions when I run it locally again | 17:59 |
clarkb | at least if you grep out the builder logs only the provider that should fail has upload attempts | 17:59 |
clarkb | if I were guessing the builder doesn't round robin its looping? I was just reading that code but I don't remember off the top of my head | 18:01 |
clarkb | so this test will succeed if provider2 happesn to go first for some reason | 18:01 |
clarkb | and fail if 01 goes first? (this is theorizing) | 18:01 |
pabelanger | clarkb: I think you are right | 18:02 |
pabelanger | all my local working tests, fake-provide1 is first | 18:02 |
pabelanger | err | 18:02 |
pabelanger | fake-provider2 is first | 18:03 |
clarkb | Shrews: ^ you may have input | 18:03 |
pabelanger | let me see if I can switch it | 18:03 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enabled zookeeper for devstack jobs https://review.openstack.org/398379 | 18:08 |
jeblair | clarkb, Shrews, pabelanger: yeah, i think we need some more exception handling there. i will write a patch. | 18:13 |
*** harlowja has joined #zuul | 18:14 | |
clarkb | oh right this maybe the case I mentioned yesterday in my review | 18:15 |
clarkb | bceause its going to raise an exception during upload | 18:15 |
jeblair | clarkb: yep :) | 18:17 |
pabelanger | neat, we know the issue | 18:21 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make image uploads separable https://review.openstack.org/398524 | 18:22 |
jeblair | clarkb, pabelanger, Shrews: ^ | 18:22 |
jeblair | i'm going to build on that a little more in a subsequent change as well | 18:22 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove DibImage from the database https://review.openstack.org/398525 | 18:24 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for UploadWorker https://review.openstack.org/398526 | 18:27 |
pabelanger | +2, want clarkb to also see | 18:27 |
clarkb | yup about to pull it up | 18:27 |
pabelanger | 2016-11-16 18:27:15,781 INFO nodepool.image.build.ubuntu-dib: * Failed to connect to bootstrap.pypa.io port 443: Connection timed out | 18:30 |
pabelanger | guess we should update nodepool devstack test to use the cached version :) | 18:30 |
mordred | pabelanger: :) | 18:31 |
clarkb | or juist fix neutron | 18:31 |
clarkb | because really we should have a working neutron | 18:31 |
pabelanger | is that why we are failing to download it? | 18:32 |
clarkb | yes | 18:33 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for CleanupWorker https://review.openstack.org/398527 | 18:33 |
clarkb | well I am 95% sure at least | 18:33 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for CleanupWorker https://review.openstack.org/398527 | 18:33 |
clarkb | guessing the job ran on osic and nodepool uses neutron and neutron breaks ipv4 internet connectivity on osic | 18:33 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for BuildWorker https://review.openstack.org/398528 | 18:35 |
pabelanger | clarkb: Ya, osic | 18:35 |
mordred | clarkb: have you looked at kevinbenton's devstack patch yet? | 18:36 |
clarkb | mordred: not yet | 18:36 |
clarkb | it was being iterated on heavily last night and I had to get dinner and stuff | 18:36 |
mordred | clarkb: https://review.openstack.org/#/c/398012/ seems to be no longer heavily iterated upon | 18:37 |
mordred | which is the new patch | 18:37 |
clarkb | cool will take a look after reviewing these nodepool fixes | 18:37 |
*** abregman has joined #zuul | 18:37 | |
*** abregman is now known as abregman|afk | 18:37 | |
mordred | I, on the other hand, am going to the dentist | 18:38 |
mordred | so you enjoy reviewing those patches :) | 18:38 |
morgan_ | mordred: reconfiguring my dev env, hopefullt will have ksa stuff cleaned up soo | 18:38 |
morgan_ | mordred: enjoy the dentist. =/ | 18:38 |
jeblair | er, i'm going to rejigger 524 | 18:39 |
jeblair | i don't think we need the stopprocessingexception | 18:39 |
clarkb | jeblair: ok if you do that maybe remove the new newline before the run() docstring? | 18:40 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for BuildWorker https://review.openstack.org/398528 | 18:41 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make image uploads separable https://review.openstack.org/398524 | 18:41 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for CleanupWorker https://review.openstack.org/398527 | 18:41 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add a fallback exception handler for UploadWorker https://review.openstack.org/398526 | 18:41 |
jeblair | clarkb: sorry | 18:41 |
clarkb | its still valid python so not a huge deal | 18:42 |
clarkb | jeblair: pabelanger I want to read the test logs for 524 before posting a review (to double check we ran both of them and just didn't get lucky running provider2 first again, but otherwise looks good) | 18:45 |
pabelanger | clarkb: I can rebase our current failure at top of that stack | 18:46 |
clarkb | pabelanger: I don't think you need to | 18:46 |
clarkb | we will just approve jeblair's first | 18:46 |
pabelanger | k | 18:46 |
clarkb | oh because the test isn't enabled right now... | 18:46 |
pabelanger | right | 18:46 |
clarkb | ok ya lets just approve it then we can recheck your change and double check the logs there | 18:46 |
pabelanger | ack | 18:46 |
Shrews | jeblair: may i ask for a bit more explanation on your comment on 398418? | 18:49 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make image cleanup separable https://review.openstack.org/398535 | 18:49 |
jeblair | Shrews: certainly! unless i'm misunderstanding, i think that change removes a cli command which i think is still necessary. | 18:50 |
jeblair | Shrews: oh i did misunderstand | 18:50 |
jeblair | Shrews: i see my mistake now :) | 18:50 |
jeblair | Shrews: sorry about that. +2. :) | 18:51 |
Shrews | jeblair: ok. i think that failure on that change might be exposing some testing overlap. getting 2 results vs. the 1 we expect. will need to investigate | 18:51 |
Shrews | (assuming i grok the failure correctly) | 18:52 |
jeblair | Shrews: which failure? | 18:53 |
Shrews | jeblair: oh, a different review. sorry | 18:54 |
Shrews | (too much stuff happened while i was at lunch) | 18:54 |
jeblair | yay stuff! | 18:56 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_dib_image_delete https://review.openstack.org/398418 | 18:58 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_config_validate https://review.openstack.org/398427 | 18:59 |
Shrews | jeblair: this is review that had the failure i was referring to: https://review.openstack.org/398437 | 18:59 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make image building separable https://review.openstack.org/398542 | 19:02 |
Shrews | oh, i see what it is | 19:03 |
Shrews | duh | 19:03 |
*** Shuo has joined #zuul | 19:05 | |
jeblair | Shrews: btw, you can probably just do waitForImage there (the upload should happen automatically; and since we're removing the image-upload command, that's the expected workflow anyway (ie, i think that's maybe what the test should test)) | 19:06 |
clarkb | pabelanger: can you rereview https://review.openstack.org/#/c/398524/2 and approve if it looks good to you? then we can recheck your change | 19:11 |
Shrews | jeblair: ack. i don't understand why, though, when i run the test_image_build test alone, it succeeds, but when i run the entire test_commands.py suite, an additional fake image exists | 19:11 |
jeblair | Shrews: i. um. | 19:12 |
Shrews | exactly | 19:12 |
pabelanger | clarkb: ack | 19:12 |
jeblair | Shrews: i get that too. | 19:13 |
Shrews | jeblair: it may be a race... the first one created from _useBuilder just isn't ready yet? | 19:14 |
Shrews | when run alone | 19:14 |
jeblair | that sounds plausible | 19:15 |
Shrews | mm, nah, i don't think that's it | 19:15 |
jeblair | Shrews: oh | 19:16 |
jeblair | Shrews: it's both a scheduled and manual build | 19:16 |
Shrews | right. so the scheduled one just hasn't been hit by the scheduler yet | 19:17 |
Shrews | ?? | 19:17 |
jeblair | yep | 19:17 |
Shrews | hah | 19:17 |
jeblair | i think this worked before because the builder didn't schedule its own builds | 19:18 |
Shrews | so i should waitForImage() before doing the manual build request, then ignore the scheduled build | 19:18 |
jeblair | we don't have a way to convince it not to do that. so we may need to alter this test to expect two builds. | 19:19 |
jeblair | Shrews: yep | 19:19 |
Shrews | exciting | 19:19 |
jeblair | start; wait for first; run manual; wait for second (will probably need to change waitForBuild to do this); verify 2 builds | 19:19 |
Shrews | jeblair: didn't you just suggest to remove waitForBuild? | 19:20 |
jeblair | Shrews: yeah, sorry, meant to type waitForImage there | 19:22 |
Shrews | hrm, this is where the missing 'count' param for getMostRecentImageUpload() would have been useful | 19:22 |
*** jamielennox|away is now known as jamielennox | 19:26 | |
Shrews | i guess i can give it a list of IDs to ignore | 19:26 |
SpamapS | bleh.. | 19:37 |
SpamapS | jeblair: fighting a bit with the gear text friendly things | 19:38 |
SpamapS | inheritance really just makes it pervasive up and down all the classes. :-P | 19:38 |
SpamapS | every attribute ends up having to be managed by getters/setters .. and data, in particular, (the response from worker->client) is complicated because it's an array that is mutated a lot by reference. | 19:39 |
jeblair | SpamapS: can you just operate on 'name' using @property? | 19:40 |
openstackgerrit | Merged openstack-infra/nodepool: Make image uploads separable https://review.openstack.org/398524 | 19:40 |
SpamapS | jeblair: name being assumed to be utf-8 is a relatively easy change. | 19:40 |
SpamapS | it's all the other things that resist. :-P | 19:41 |
SpamapS | let me submit the name change by itself. | 19:41 |
Shrews | jeblair: LOL/CRY... so, getMostRecentImageUpload() in waitForImage() is always getting the scheduled upload in that case (since it happens after the manual build request). | 19:41 |
openstackgerrit | Merged openstack-infra/nodepool: Add a fallback exception handler for UploadWorker https://review.openstack.org/398526 | 19:41 |
jeblair | oh, i misunderstood what you were saying (i thought you meant "i can't just @property name, i have to __setattr__ the whole class") | 19:41 |
Shrews | i'm obviously doing something wrong | 19:42 |
Shrews | :) | 19:42 |
jeblair | Shrews: that should be good enough to wait for the first (scheduled) upload to finish, then if you submit a manual request, we do still need something to wait for that (either a modification to that function, or just a loop over getmostrecentupload until it returns something different, maybe) | 19:43 |
SpamapS | jeblair: What I'm having to do is make a getter/setter for everything that wants to be different in Text(Client/Worker/Job/WorkerJob) | 19:46 |
SpamapS | jeblair: and then also make a binary accessor for times when you actually want the binary no matter what. | 19:47 |
Shrews | jeblair: actually, i'm not entirely convinced that getMostRecentImageUpload is working properly | 19:48 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul: Don't retry when using synchronize module https://review.openstack.org/398558 | 19:49 |
clarkb | jeblair: ^ there we go | 19:49 |
jeblair | SpamapS: hrm, i thought with an inheritance sequence where Text things override Binary things we would just not have to do anything tricky for the binary bits. | 19:52 |
jeblair | Shrews: how so? | 19:52 |
SpamapS | jeblair: the problem is that we pass around jobs internally and make packets out of them | 19:52 |
SpamapS | and that _always_ has to be binary | 19:52 |
*** hasharAway is now known as hashar | 19:52 | |
morgan_ | SpamapS: this is largely what I ran across in my previous attempts at this | 19:53 |
SpamapS | morgan_: yep | 19:53 |
SpamapS | I'm not saying it can't be done | 19:53 |
Shrews | jeblair: the ZK data looks correct, but it doesn't seem to be returning the most recent upload across builds (for whatever reason) | 19:53 |
SpamapS | just that it's a whole lot of new code for convenience sake. | 19:53 |
Shrews | jeblair: putting together a unit test for it now | 19:53 |
morgan_ | SpamapS: it's an issue with python3 and the technology we're using, it's doable, it's just unfun and what you just said. alot of code for working around base features of what we're using | 19:54 |
morgan_ | SpamapS: for some minor convienence | 19:54 |
morgan_ | but i agree it is doable. just not sure if it is worth it | 19:54 |
SpamapS | Well I made one that didn't use inheritance | 19:54 |
SpamapS | so the facade was only on the user side | 19:54 |
SpamapS | internally gear just kept using Job and WorkerJob | 19:55 |
SpamapS | that was a lot less code, but... maybe a bit confusing code ;) | 19:55 |
morgan_ | ah yeah | 19:56 |
morgan_ | i could see that | 19:56 |
SpamapS | https://review.openstack.org/398560 | 19:56 |
SpamapS | ^^ this is just to make name always utf-8 assumed (so you always get strings back from job.name) | 19:56 |
SpamapS | which is technically an API break | 19:56 |
SpamapS | morgan_: if you want to look at the facade I made.. it's here for now: https://review.openstack.org/393544 | 19:57 |
morgan_ | i think it's an ok API break because it is unlikely that anyone has seriously used it in py3 until very recently at best | 19:57 |
morgan_ | and in py2... again not really something anyone dealt with | 19:57 |
SpamapS | Yeah it doesn't break py2 at all | 19:58 |
SpamapS | and the existing tests are unchanged | 19:58 |
SpamapS | so it's a really, really subtle API break | 19:58 |
morgan_ | yeah | 19:58 |
SpamapS | I should actually add a test for the name | 19:58 |
SpamapS | in string | 19:58 |
openstackgerrit | Merged openstack-infra/nodepool: Add a fallback exception handler for CleanupWorker https://review.openstack.org/398527 | 20:01 |
morgan_ | I personally prefer the application using gear managing the binary conversion/stringifying vs making gear smarter. | 20:01 |
SpamapS | morgan_: well I think the idea is to give the user a thing that does the simplest thing well. | 20:02 |
morgan_ | but that doesn't mean my view is correct for this case | 20:02 |
morgan_ | just personal preferance. | 20:02 |
clarkb | jeblair: starting at https://review.openstack.org/#/c/398535/1 that exception handling stack has failed pep8 | 20:03 |
jeblair | i hope the idea that gear providing an interface that lets this simple thing work for many use-cases (including the one gear was created for) without reducing our ability to interface at a low level is a compromise we can agree on. :) | 20:04 |
SpamapS | Like, in the facade version I put up, if you're using TextWorker, you get a string arguments, so your worker can just be like 'job.sendWorkComplete(process_data(json.loads(job.arguments)))' | 20:04 |
jeblair | clarkb: on it | 20:05 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make image cleanup separable https://review.openstack.org/398535 | 20:06 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make image building separable https://review.openstack.org/398542 | 20:06 |
SpamapS | jeblair: so, when you do get time to look at the one I just pushed up that just does name.. to do a full text client/worker/job set.. every attribute needs three methods.. 2 for access by the user, and 1 for library intra-class access to the binary attributes (we could perhaps just make the binary ones named, so not a method actually) | 20:06 |
clarkb | I am going to grab lunch but then will be back to keep reviewing those nodepool changes | 20:07 |
clarkb | pabelanger: don't forget fixing the zk connection stuff in nodepool.py (I can help write that change too if that helps you) | 20:07 |
pabelanger | clarkb: ya, working on that now | 20:08 |
jeblair | SpamapS: right -- though name (and unique?) are simple. data, as you point out, is slightly more complicated. but those are the only things we have to deal with, right? | 20:08 |
clarkb | pabelanger: awesome | 20:08 |
pabelanger | clarkb: just finished rechecking some failed, patches | 20:08 |
pabelanger | clarkb: I think you can continue your revires | 20:08 |
openstackgerrit | Merged openstack-infra/nodepool: Add a fallback exception handler for BuildWorker https://review.openstack.org/398528 | 20:08 |
pabelanger | reviews* | 20:08 |
jeblair | SpamapS: i'll take a closer look after lunch | 20:09 |
SpamapS | jeblair: arguments needs to be textified too | 20:09 |
clarkb | pabelanger: ya I can, but need food | 20:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add test for getMostRecentImageUpload https://review.openstack.org/398565 | 20:10 |
Shrews | jeblair: nm, it appears to work according to ^^^. issue lies elsewhere | 20:10 |
Shrews | oh! but i realize where the problem is, thanks to creating that test | 20:13 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Rename zookeeper_client to zk for nodepool.py https://review.openstack.org/398566 | 20:13 |
pabelanger | Shrews: clarkb: ^ | 20:14 |
pabelanger | that should address your previous comments | 20:14 |
* mordred announces that he has a very numb half face | 20:17 | |
pabelanger | quick get the trout | 20:17 |
mordred | :) | 20:20 |
pabelanger | never see this before: http://logs.openstack.org/66/398566/1/check/gate-nodepool-python27-db-ubuntu-xenial/6bdcc1d/console.html#_2016-11-16_20_17_19_099902 | 20:21 |
pabelanger | Resource temporarily unavailable (src/signaler.cpp:301) | 20:21 |
mordred | seems to be a zmq thing | 20:25 |
mordred | https://github.com/zeromq/libzmq/issues/1583 | 20:25 |
morgan_ | mordred: i welcome our new, half-numb-faced, overlords | 20:27 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add build_id attribute to ImageUpload object https://review.openstack.org/398568 | 20:29 |
mordred | morgan_: :) | 20:31 |
mordred | morgan_: half-numb-face is eversomuchfun | 20:31 |
morgan_ | mordred: i did whole-numb-face last dentist visit | 20:32 |
morgan_ | mordred: ok so i'm reinstalling my dev machine now and will be working on fixing betamax in ksa then onto task interface | 20:32 |
morgan_ | mordred: i have some ideas on testing :) | 20:32 |
morgan_ | mordred: hopefully have this wrapped up today/tomorrow/friday :) | 20:33 |
mordred | woot | 20:41 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_config_validate https://review.openstack.org/398427 | 20:41 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_dib_image_delete https://review.openstack.org/398418 | 20:41 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_build https://review.openstack.org/398437 | 20:41 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete https://review.openstack.org/398440 | 20:43 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Remove test: test_snapshot_image_update https://review.openstack.org/398445 | 20:43 |
Shrews | pabelanger: sorry, i had to be heads down for a bit to get some of this stuff rolled out. did you still need my help with something? | 20:44 |
pabelanger | Shrews: I did have a question about: https://review.openstack.org/#/c/398502/1/nodepool/zk.py | 20:45 |
pabelanger | I had to add recursive to fix an exception that kazoo was raising | 20:45 |
pabelanger | not sure if I was using the function properly or not | 20:45 |
Shrews | hrm, let me think about that a bit. not sure if we really want recursive there | 20:46 |
pabelanger | ack | 20:46 |
pabelanger | like I said, it is possible I am using it incorrectly | 20:46 |
pabelanger | was trying to delete an image | 20:47 |
Shrews | pabelanger: we shouldn't delete a build unless the uploads have been deleted first... which may be the reason i didn't have recursive there | 20:47 |
Shrews | gotta revisit the logic a bit | 20:47 |
mordred | Shrews: I believe you have an oops in https://review.openstack.org/#/c/398568 ? | 20:47 |
Shrews | mordred: ugh | 20:48 |
pabelanger | yay | 20:48 |
pabelanger | http://logs.openstack.org/79/398379/4/check/gate-dsvm-nodepool/a7fc84b/logs/nodepool-image-list.txt.gz | 20:48 |
mordred | Shrews: have I mentioned the dislike for positional parameters I've developed? | 20:48 |
pabelanger | now to see why the job failed | 20:48 |
mordred | pabelanger: it's the name | 20:49 |
mordred | pabelanger: the test is doing a grep for 'trusty-server' | 20:49 |
pabelanger | mordred: Ah | 20:50 |
pabelanger | thanks | 20:50 |
mordred | pabelanger: that's SUPER EXCITING | 20:50 |
pabelanger | http://logs.openstack.org/79/398379/4/check/gate-dsvm-nodepool/a7fc84b/logs/screen-nodepool.txt.gz#_2016-11-16_19_50_02_309 | 20:50 |
pabelanger | NODE ONLINE | 20:50 |
pabelanger | wee | 20:50 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add build_id attribute to ImageUpload object https://review.openstack.org/398568 | 20:50 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_dib_image_delete https://review.openstack.org/398418 | 20:51 |
mordred | pabelanger: I love it when we work really hard and the end result is that we have booted a node | 20:52 |
mordred | pabelanger: and we're legitimately excited about having done so | 20:52 |
pabelanger | ++ | 20:52 |
pabelanger | it is exciting for sure | 20:52 |
mordred | pabelanger: I remember being similarly excited when we replaced the nodepool calls to openstack with shade | 20:53 |
mordred | and the end result of much pain was a thing that worked exactly the same as before | 20:53 |
timrc | It has more potential now, though :) | 20:54 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enabled zookeeper for devstack jobs https://review.openstack.org/398379 | 20:54 |
pabelanger | mordred: okay, that should do it^. Was looking for trusty-server, like you said, which was a snaphot build | 20:54 |
pabelanger | we can remove it now | 20:54 |
Shrews | pabelanger: so... | 20:56 |
Shrews | about that test... it seems redundant (see https://review.openstack.org/398418), and adding recursive there is not what we want | 20:56 |
pabelanger | Shrews: Ya, I think we are going to delete it now, since it lives in test_nodepool.py, but I wanted to see if recursive was needed or not | 20:57 |
clarkb | the command test is not redundant right? its trsting the cli works | 20:58 |
clarkb | maybe I am talking about different test | 20:58 |
Shrews | pabelanger: what should happen is you mark the build as 'deleted' in ZK, and then the builder janitor thread will do the actual deleting of things (from provider and disk) | 20:59 |
pabelanger | ya | 20:59 |
Shrews | pabelanger: so calling zk.deleteBuild() there is not the right way | 20:59 |
pabelanger | clarkb: I think we agreed in backscroll to add a new test to make sure nodepool-builder actually did the deletes from disk | 21:00 |
pabelanger | Shrews: k, when do we use it? | 21:00 |
clarkb | pabelanger: correct but you still need a test that the cli worls too | 21:00 |
clarkb | ypu need both | 21:00 |
pabelanger | yup | 21:00 |
Shrews | pabelanger: after you call deleteUpload() for all of the uploads of that build. | 21:01 |
Shrews | pabelanger: if we recursively delete from the build node downward, then we'd be missing the provider upload information | 21:02 |
pabelanger | right | 21:03 |
Shrews | so i guess it's sort of a safety mechanism to not recursively delete | 21:03 |
pabelanger | So, that means we cannot delete a image on disk without deleting it from the provider uploads too? | 21:03 |
Shrews | pabelanger: you can delete on disk stuff, just gotta mark the build node as 'deleted' when you do | 21:04 |
clarkb | that seema like an anyi feature | 21:04 |
clarkb | ah ok | 21:04 |
pabelanger | Ya, trying to think how it works today | 21:05 |
clarkb | (sometimes you run out of disk) | 21:05 |
pabelanger | right :) | 21:05 |
Shrews | pabelanger: this code is pretty much what i just described: https://github.com/openstack-infra/nodepool/blob/feature/zuulv3/nodepool/builder.py#L316-L321 | 21:05 |
openstackgerrit | Merged openstack-infra/nodepool: Add test for getMostRecentImageUpload https://review.openstack.org/398565 | 21:10 |
clarkb | pabelanger: ok back to review that stack | 21:11 |
pabelanger | yay | 21:11 |
jeblair | clarkb: if that becomes a problem, we could probably delete the actual file on disk before we delete the znode (which should still only happen after the uploads are deleted) | 21:11 |
Shrews | pabelanger: btw, we should just be setting state on zk nodes to trigger things happening. so you shouldn't have to try to model that same logic elsewhere | 21:12 |
pabelanger | Shrews: okay, I'll do that moving forward | 21:12 |
clarkb | jeblair: I guess I don't think we need the strict enforcement that an image must not exist in a cloud for it to be removed locally, I think in an ideal world thats the case but we run out of disk all the time | 21:12 |
jeblair | clarkb: i don't think we disagree? | 21:14 |
clarkb | I had to reread your comment a few times but now I think I get it | 21:14 |
clarkb | basically you are saying its ok to have a znode in deleted state and no content on disk | 21:15 |
jeblair | yeah. that's not how it works now. but i think we can try it for a little bit, and if we need to reorder it, we can. it should be a simple change and does not substantially change the deletion algorithm. | 21:15 |
clarkb | so zk still has a record of the thing but its not consuming any real bytes on your filesystem (outside of a smal record in zk) | 21:15 |
jeblair | clarkb: correct | 21:15 |
clarkb | gotcha ya that should work too | 21:15 |
jeblair | Shrews: a suggestion on 398437, but i think the logic looks right. | 21:23 |
Shrews | jeblair: can totally just pass the object in. good idea | 21:25 |
Shrews | i like to complicate things | 21:25 |
Shrews | and then have you fix me | 21:25 |
Shrews | :) | 21:25 |
jeblair | i like to simplify things. we're a team. | 21:25 |
clarkb | jeblair: Shrews https://review.openstack.org/#/c/398535/2 commenst on that that I think is related | 21:29 |
clarkb | can you tell me if I am wrong on those two things (I +2'd because its not a regression, but suggestions for simplification) | 21:29 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_node_ipv6 test https://review.openstack.org/398446 | 21:29 |
jeblair | clarkb leaves comment requiring brain thinking | 21:30 |
clarkb | tldr I think we are hitting zk way more than necessary :) | 21:31 |
pabelanger | Yay! | 21:31 |
pabelanger | devstack job worked | 21:31 |
pabelanger | http://logs.openstack.org/79/398379/5/check/gate-dsvm-nodepool-src-shade/dfc3449/console.html | 21:31 |
pabelanger | Excellent work Shrews and jeblair! | 21:32 |
jeblair | pabelanger: yays! | 21:33 |
*** abregman|afk has quit IRC | 21:35 | |
jeblair | clarkb: a build that is not ready, building, or deleted doesn't leave much... that pretty much covers all of the current states... | 21:38 |
clarkb | jeblair: well we already exclude ready, building, and deleted there | 21:39 |
clarkb | we just do it in a really round about way | 21:39 |
clarkb | oh we aren't excluding deleted | 21:39 |
jeblair | i don't think so... i'll reply inline | 21:39 |
clarkb | just building and ready | 21:39 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_node_delete_success test https://review.openstack.org/398448 | 21:39 |
clarkb | but still I think we nca more easily, make list excluding building and ready once (cuts down on zk queries) then continue | 21:39 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_node_delete_failure test https://review.openstack.org/398450 | 21:39 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_db test https://review.openstack.org/398451 | 21:40 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_leaked_node test https://review.openstack.org/398458 | 21:40 |
jeblair | clarkb: basically, we want to delete any build that is 'delete', ignore any 'building' that's actually building, delete any 'building' that isn't building, and delete any ready older than the 2 most recent | 21:40 |
jeblair | so it's not super straightforward :) | 21:40 |
clarkb | ah tahts what the 2 there means ok. So I think the case we don't need to handle is buidling that isn't building | 21:41 |
clarkb | we should just make sure that that always updates to delete? | 21:41 |
jeblair | clarkb: it's to handle the case where someone hits the builder with a sigkill | 21:41 |
jeblair | or oom or whatever | 21:41 |
clarkb | I see | 21:42 |
clarkb | ok so maybe we can't easily simplify that portion of the code, I think the second comment is doable though? | 21:42 |
clarkb | hrm actually because its over all known providers and not just providers with uploads that may not work (depending on how case of delete upload that doesn't exist is handled) | 21:43 |
Shrews | part of the thing that complicates that code is it must not assume that it has access to all providers that might be recorded in zookeeper. | 21:46 |
Shrews | in that case that one might want to split up provider ownership across different machines | 21:47 |
jeblair | (which is difficult, but doable with a network file system. any one will do.) | 21:47 |
pabelanger | clarkb: I think you can also review https://review.openstack.org/#/c/398379/ now. That brings our dsvm jobs online again | 21:47 |
clarkb | pabelanger: yup I have pulled that change and a couple others up for review next | 21:47 |
clarkb | jeblair: Shrews: for the keeping a deleted node around? I agree but I also think we can use error handling to more easily express that in the code and avoid more zk lookups | 21:48 |
clarkb | jeblair: Shrews: basically if I failed to delete this then keep the deleted nodearound in zk | 21:48 |
clarkb | ratherthan doing a second pass to see what was and wasn't deleted | 21:48 |
clarkb | Shrews: is the problem there that you may not even attempt to delete from a provider bceause known providers is known to me only? | 21:49 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Log image delete exception as exception https://review.openstack.org/398596 | 21:49 |
clarkb | pabelanger: is there a snapshot version of https://review.openstack.org/#/c/398525/1 ? | 21:51 |
pabelanger | clarkb: not yet, but can get started | 21:51 |
jeblair | clarkb: yeah, imagine that we ran an image builder local to each region we use (but they shared a filesystem, so they used the same image files, but we wanted to be local when uploading). i don't know if it's a *good* topology, but it would work with this. :) | 21:52 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove DibImage from the database https://review.openstack.org/398525 | 21:53 |
pabelanger | clarkb: sorry, missed some code | 21:53 |
clarkb | jeblair: Shrews we should probably document that? seems like a very subtle thing that would be easy to break in the future (also maybe test that specific behavior of two different builders with the same "backend" fs) | 21:54 |
jeblair | clarkb, Shrews: as i think about this, we may have some problems removing a provider. i think that would look exactly like this case to the builder. | 21:54 |
Shrews | jeblair: yes, that could pose a problem with the existing cleanup logic | 21:55 |
clarkb | pabelanger: http://logs.openstack.org/79/398379/5/check/gate-dsvm-nodepool/82c6e61/console.html failed | 21:56 |
clarkb | pabelanger: so I don't know that we are quite there yet on making the integration testing work | 21:56 |
Shrews | my brain is fried from all the context switching on things. time to step away for a bit | 21:57 |
pabelanger | clarkb: I think that is the neutron bug, causing dib to fail to build | 21:58 |
* mordred hands Shrews some nice fried pig brians | 21:58 | |
clarkb | pabelanger: ah | 21:58 |
clarkb | pabelanger: wonderful | 21:58 |
pabelanger | ya | 21:58 |
pabelanger | http://logs.openstack.org/79/398379/5/check/gate-dsvm-nodepool/82c6e61/logs/screen-nodepool-builder.txt.gz | 21:58 |
jeblair | yeah, last thing there is get-pip | 21:59 |
clarkb | pabelanger: I also notice that you don't explicitly start zookeeperd, I thought that it didn't auto start on ubuntu? was i mistaken? | 21:59 |
jeblair | clarkb: i think it does if you install the zookeeperd package (we do) | 21:59 |
clarkb | oh are there different packages with different behavior? interesting | 21:59 |
pabelanger | Ya, zookeeperd has the init.d scripts | 22:00 |
jeblair | clarkb: there's a separate server package and init scripts | 22:00 |
mordred | yah. zookeeper has the zookeeper | 22:00 |
pabelanger | right | 22:00 |
clarkb | gotcha | 22:00 |
mordred | it's a great design | 22:00 |
mordred | I wish more things did it | 22:00 |
jeblair | "it's a great design" [evaluation: 50% troll probability] "I wish more things did it" [evaluation: 15% troll probability] | 22:01 |
clarkb | and with the kazoo logging fix we no longer get 12MB compressed files of logs | 22:01 |
mordred | jeblair: :) | 22:01 |
jeblair | mordred: the trollocop plugin for irssi is indespensible | 22:02 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove SnapshotImage from the database https://review.openstack.org/398602 | 22:02 |
pabelanger | and that should be the snapshot bits from the database | 22:02 |
mordred | jeblair: I like that it serves the half of the people who believe that installing a service should imply starting it AND the half of people who believe the opposite | 22:02 |
jeblair | clarkb, pabelanger: so i heard some talk about using a cached get-pip? what's up with that? | 22:02 |
mordred | that is, of course, one of the fundamental philosophical divergences between debian and red hat | 22:02 |
jeblair | mordred: turns out, we *can* all just get along. | 22:02 |
clarkb | jeblair: we can set dib env vars such that it will use the cached version of that file that we cache for devstack | 22:03 |
mordred | right? | 22:03 |
clarkb | (I think) | 22:03 |
mordred | yah | 22:03 |
mordred | it's DIB_REPOLOCATION_something_something | 22:03 |
mordred | greghaynes: ^^ | 22:03 |
pabelanger | clarkb: jeblair: Ya, we can first check it exists on dist, then add it for the gate | 22:03 |
pabelanger | disk* | 22:03 |
greghaynes | hrm? | 22:03 |
jeblair | has that been implemented elsewhere so we can copy it? | 22:03 |
greghaynes | oh, yes, DIB_REPOLOCATION_pip_and_virtualenv=/some/path | 22:03 |
greghaynes | worst var name ever | 22:04 |
clarkb | greghaynes: basically neutron is breaking our ability to internet to fetch get-pip.py when building a dib image for nodepool integration test | 22:04 |
clarkb | greghaynes: but we already have a copy on the test instance so maybe we can reuse it | 22:04 |
mordred | yah - so just set that var to where we store get-pip.py and we should be good | 22:04 |
jeblair | http://git.openstack.org/cgit/openstack-infra/nodepool/tree/devstack/plugin.sh#n254 | 22:04 |
clarkb | greghaynes: http://logs.openstack.org/79/398379/5/check/gate-dsvm-nodepool/82c6e61/logs/screen-nodepool-builder.txt.gz#_2016-11-16_21_10_43_097 is the logging where the internets don't work | 22:05 |
greghaynes | awesome, I think trove was hitting that too | 22:05 |
mordred | greghaynes: see jeblair's link | 22:05 |
clarkb | I think its /opt/stack/new/devstack/files/get-pip.py | 22:05 |
mordred | it seems we ARE setting it | 22:05 |
jeblair | but only if NODEPOOL_CACHE_GET_PIP is set, right? | 22:05 |
pabelanger | let me check quickly | 22:05 |
mordred | NODEPOOL_CACHE_GET_PIP=/opt/stack/cache/files/get-pip.py | 22:05 |
mordred | that's in the top of the file | 22:05 |
jeblair | oh i think this is a master/zuulv3 thing | 22:06 |
clarkb | not sure if that is the right path | 22:06 |
*** hashar has quit IRC | 22:06 | |
greghaynes | im not sure the file:// is what you want to have set either | 22:06 |
jeblair | like it may not be added to v3 branch | 22:06 |
clarkb | jeblair: oh we need to merge that into v3 | 22:06 |
pabelanger | ha | 22:06 |
mordred | :) | 22:06 |
mordred | what? we solved this before? | 22:06 |
clarkb | jeblair: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/devstack/plugin.sh?h=feature/zuulv3 | 22:06 |
clarkb | jeblair: so ya think thats the case | 22:06 |
mordred | I agree | 22:06 |
pabelanger | looks like it | 22:06 |
clarkb | apparentl;y we don'y use the default devstack files path either | 22:07 |
clarkb | maybe thats because we want to move things around before devstack is cloned | 22:07 |
pabelanger | path is correct for get-pip.py | 22:07 |
openstackgerrit | Merged openstack-infra/nodepool: Enabled zookeeper for devstack jobs https://review.openstack.org/398379 | 22:07 |
clarkb | mordred: it turns out that neutron has been broken since august | 22:08 |
clarkb | mordred: and attempts to fix it have been lacking so everyone and their uncle has worked around it | 22:08 |
mordred | clarkb: yah | 22:08 |
mordred | well - there is a devstack patch outstanding ... | 22:09 |
clarkb | yup and its on my list of things to review today :) | 22:09 |
mordred | https://review.openstack.org/398012 ... so far no devstack cores have reviewed | 22:09 |
mordred | hey jeblair ^^ I hear you have +2 access ... | 22:09 |
clarkb | ianw and sc68cl too iirc | 22:10 |
mordred | clarkb: ooh. I'll bug them | 22:10 |
greghaynes | I also think the line should be DIB_REPOLOCATION_pip_and_virtualenv: $NODEPOOL_CACHE_GET_PIP BTW | 22:10 |
greghaynes | or, is that tested somewhere? | 22:10 |
mordred | greghaynes: it's how the master branch nodepool jobs run | 22:10 |
greghaynes | hrm | 22:10 |
mordred | lemme find log | 22:10 |
greghaynes | how on earth does that work | 22:11 |
mordred | greghaynes: http://logs.openstack.org/50/397750/1/check/gate-dsvm-nodepool-src-shade/41ba133/ | 22:11 |
ianw | mordred: ok ... that's a lota changelog. i was probably scared when i last went through devstack queue :) | 22:11 |
clarkb | mordred: we need a devstack-gate change to set that var too | 22:11 |
mordred | http://logs.openstack.org/50/397750/1/check/gate-dsvm-nodepool-src-shade/41ba133/logs/screen-nodepool-builder.txt.gz | 22:11 |
clarkb | mordred: do you know if that exists yet? | 22:11 |
*** Shuo_ has joined #zuul | 22:11 | |
mordred | clarkb: I will make one right now | 22:11 |
clarkb | mordred: kk | 22:11 |
greghaynes | mordred: TIL that curl supports curl file:///some/path | 22:12 |
*** Shuo has quit IRC | 22:12 | |
greghaynes | mordred: which is how that works | 22:12 |
clarkb | curl is magical | 22:12 |
greghaynes | so, if you didnt have file:// dib would notice its a path and not shell out to curl | 22:13 |
greghaynes | but yea, not an issue apparently because curl is awesome | 22:13 |
openstackgerrit | Merged openstack-infra/nodepool: Rename zookeeper_client to zk for nodepool.py https://review.openstack.org/398566 | 22:14 |
mordred | clarkb: https://review.openstack.org/398611 | 22:14 |
mordred | greghaynes: neat! | 22:14 |
clarkb | mordred: what do you think about supporting a more easy to understand devstack_gate var too? | 22:15 |
clarkb | mordred: basically support the old thing for compat but also support a new var | 22:15 |
clarkb | so DEVSTACK_GATE_IPv4_ADDRS_SAFE_TO_USE defaults to DEVSTACK_GATE_IPv4_FIXED_RANGE | 22:15 |
clarkb | oh wait | 22:15 |
clarkb | thats what it does already? do we not have this in the -wrap script? | 22:16 |
mordred | clarkb: sorry - I talked about that in infra channel because it started feeling like this was less zuul related | 22:16 |
clarkb | wow ya we don't have it in -wrap | 22:16 |
mordred | yah - I think we should just drop the D_G var | 22:16 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove stray print statement https://review.openstack.org/398623 | 22:18 |
clarkb | is anyone working on cherry picking the plugin fix on master to v3 yet? | 22:20 |
clarkb | I can do that as soon as I finish reviewing these devstack thigns | 22:20 |
mordred | clarkb: I can do it - if nobody else is | 22:21 |
mordred | pabelanger: https://review.openstack.org/#/c/398596 ... was it the , you were concerned about? | 22:21 |
clarkb | mordred: I am done with devstack reviews, I will go ahead and cherry pick now | 22:23 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Install get-pip.py from local cached location https://review.openstack.org/398626 | 22:23 |
mordred | clarkb: oh - sorry - was already doing it | 22:24 |
clarkb | oh no biggy | 22:24 |
clarkb | now I just have to review it | 22:24 |
mordred | clarkb: I'm going to follow up with a patch merging the rest of the devstack plugin changes | 22:25 |
clarkb | mordred: the indentation around that heredoc stuff just melted my brain | 22:26 |
mordred | yah | 22:26 |
mordred | clarkb: I'm missing an fi I think | 22:28 |
clarkb | in deed you are | 22:29 |
clarkb | see how melted my brain is? | 22:29 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Install get-pip.py from local cached location https://review.openstack.org/398626 | 22:30 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Sync devstack plugin from master https://review.openstack.org/398632 | 22:30 |
mordred | clarkb: ok - there's the fi and also the rest of the changes to the devstack/plugin.sh from master | 22:31 |
mordred | which we might as well sync because we'll need to sync them when we merge back anyway :) | 22:31 |
Shuo_ | gerrit seems to built for working with git. Just wondering, has anyone made gerrit work with perforce (at least I know google's internal code review system was built along with perforce back then :-) ) | 22:32 |
mordred | Shuo_: not to my knowledge - but gerrit is VERY deeply tied to git | 22:32 |
mordred | Shuo_: as is zuul | 22:32 |
mordred | Shuo_: like, gerrit is based on a jgit implementation - it would be a substantial rewrite of most of it to accomodate non-git things | 22:33 |
jeblair | mordred: one thing on your plugin sync change | 22:34 |
mordred | woot | 22:35 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Sync devstack plugin from master https://review.openstack.org/398632 | 22:35 |
mordred | jeblair: thanks - that was a line I was unsure on | 22:35 |
pabelanger | mordred: Oh, maybe it was line 257 that is the issue | 22:37 |
pabelanger | shows up red in gerrit ui | 22:38 |
mordred | pabelanger: which patch? | 22:38 |
pabelanger | 398596, sorry for not saying | 22:39 |
mordred | no worries! | 22:39 |
mordred | pabelanger: ah! yes - that does seem to be unhappy line | 22:39 |
clarkb | mordred: one more thing on that change | 22:39 |
clarkb | 632 that is | 22:39 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool: Sync devstack plugin from master https://review.openstack.org/398632 | 22:41 |
mordred | clarkb: done. thank you | 22:41 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Log image delete exception as exception https://review.openstack.org/398596 | 22:41 |
openstackgerrit | Merged openstack-infra/nodepool: Make image cleanup separable https://review.openstack.org/398535 | 22:50 |
openstackgerrit | Merged openstack-infra/nodepool: Make image building separable https://review.openstack.org/398542 | 22:51 |
openstackgerrit | Merged openstack-infra/nodepool: Add build_id attribute to ImageUpload object https://review.openstack.org/398568 | 22:51 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test: test_dib_image_delete https://review.openstack.org/398418 | 22:51 |
ianw | mordred: FIXED_RANGE and subnet pools are mutually exclusive right? you can't just setup your network on fixed range, you've got to ask for a subnet pool? | 22:52 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove SnapshotImage from the database https://review.openstack.org/398602 | 22:53 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove DibImage from the database https://review.openstack.org/398525 | 22:53 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Replace snap_image with cloud_image https://review.openstack.org/398638 | 22:53 |
pabelanger | clarkb: I went with cloud_image ^ | 22:55 |
mordred | ianw: yes | 22:55 |
mordred | ianw: they are different things - which is why the complexity | 22:56 |
mordred | ianw: SUBNETPOOL_PREFIX may describe a larger network from which many smaller subnets are created | 22:56 |
openstackgerrit | Merged openstack-infra/nodepool: Install get-pip.py from local cached location https://review.openstack.org/398626 | 22:56 |
mordred | ianw: FIXED_RANGE describes a single network to be used directly without subdivision | 22:56 |
mordred | ianw: AIUI | 22:56 |
ianw | mordred: ok, cool. yes that's AIUI too so glad we agree :) | 22:58 |
mordred | \o/ | 22:58 |
mordred | it's almost like I've learned things | 22:58 |
SpamapS | jeblair: FYI, I just sent a message to openstack-infra and asked people to bother you. So.. prepare to be bothered. ;) | 22:59 |
jeblair | SpamapS: wait people are only *just now* going to start bothering me?! | 23:00 |
jeblair | SpamapS: ++ | 23:03 |
mordred | jeblair: yes. up until now the feeling you've experienced is "joy" | 23:03 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_job_start_event test https://review.openstack.org/398463 | 23:03 |
mordred | jeblair: so just try to remember what it felt like | 23:03 |
jeblair | mordred: am i about to be inundated with "love"? | 23:03 |
mordred | jeblair: sure. that's definitely the right word | 23:06 |
mordred | SpamapS: good email | 23:07 |
*** Cibo has quit IRC | 23:08 | |
openstackgerrit | Merged openstack-infra/nodepool: Sync devstack plugin from master https://review.openstack.org/398632 | 23:09 |
pabelanger | jeblair: Shrews: So, I only have 3 more tests in test_nodepool.py to do, they are related to images. I think we agree to delete them and move them into test_builder | 23:17 |
pabelanger | Aside from that, dsvm job is now working too | 23:17 |
pabelanger | think we made some great process today | 23:17 |
pabelanger | and zookeeper.py works great! | 23:17 |
pabelanger | jeblair: one thing I did notice with our testing, our configs are still setup to use jenkins_manager.py. Should we make the switch to assign-via-gearman for our targets? | 23:20 |
jeblair | pabelanger: let's leave that alone for now. it's time will come when we do the next phase. :) | 23:21 |
openstackgerrit | Merged openstack-infra/nodepool: Remove stray print statement https://review.openstack.org/398623 | 23:21 |
pabelanger | jeblair: cool | 23:23 |
pabelanger | http://logs.openstack.org/93/398493/1/check/nodepool-coverage-db-ubuntu-xenial/0da901c/console.html | 23:27 |
pabelanger | nodepool.tests.test_nodepool.TestNodepool.test_subnodes failed there, did a recheck | 23:27 |
pabelanger | but going to see try and reproduce it | 23:27 |
clarkb | pabelanger: reading the logs almost looks like gearman server wasn't available/ | 23:28 |
pabelanger | clarkb: ya, I see that too | 23:29 |
pabelanger | but looks like after the schedule was shutdown | 23:29 |
clarkb | the scheduler shouldn't stop until the job ends and that job didn't end properly | 23:31 |
clarkb | oh! | 23:31 |
clarkb | pabelanger: checkout the traceback, this is a long standing known issue | 23:31 |
clarkb | pabelanger: this is the case where we try to drop the db but it doesn't drop | 23:31 |
pabelanger | Ah | 23:31 |
ianw | mordred: ok, i think i got my head around it; LGTM but i think follow-ons could help as mentioned too | 23:31 |
pabelanger | clarkb: first time I've seen it | 23:32 |
clarkb | pabelanger: so its timing out in the job cleanup which is trying to drop the db | 23:32 |
pabelanger | ya, I see that now | 23:32 |
clarkb | pabelanger: I had a few changes up to try and debug it by outputing mysql debugging info but that didn't really get anywhere | 23:32 |
pabelanger | ack | 23:32 |
clarkb | pabelanger: I would probably ignore it unless you see it happen frequently, in which case we can resurrect those debugging changes and try again | 23:33 |
clarkb | looks like Shrews' stack of changes got into a merge conflict | 23:35 |
mordred | boo | 23:35 |
pabelanger | Ya, that is likely my fault :( | 23:36 |
pabelanger | when we change node_dib.yaml to node.yaml today | 23:36 |
clarkb | pabelanger: jeblair mordred can I get reviews on https://review.openstack.org/#/c/398558/ before too long? that way we can restart the launchers soonish hopefully and get better error logging | 23:49 |
pabelanger | checking | 23:50 |
pabelanger | clarkb: +2 | 23:51 |
clarkb | thanks | 23:51 |
mordred | clarkb: +A | 23:52 |
openstackgerrit | Merged openstack-infra/nodepool: Remove DibImage from the database https://review.openstack.org/398525 | 23:52 |
pabelanger | clarkb: https://review.openstack.org/#/c/398638 is my last stack of the day | 23:53 |
jeblair | abadger suggested a fix for that if someone wants to do some ansible hacking | 23:53 |
jeblair | described in https://github.com/ansible/ansible/issues/18281 | 23:53 |
pabelanger | that removes our database tables for snaphots | 23:53 |
openstackgerrit | Merged openstack-infra/zuul: Don't retry when using synchronize module https://review.openstack.org/398558 | 23:55 |
clarkb | I have promised a new zanata dev server this week so probably need to start that tomorrow, maybe if that goes quickly I can figure out the ansible fixes | 23:56 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!