SpamapS | hrm | 00:04 |
---|---|---|
SpamapS | so why was job_has_change removed from ZuulTestCase ? | 00:04 |
* SpamapS can't find the commit that did it | 00:04 | |
clarkb | SpamapS: git log -p is magical way to find commits that do things | 00:05 |
clarkb | (I use it far too much) | 00:05 |
SpamapS | clarkb: indeed, I forgot about that | 00:06 |
clarkb | pabelanger: I rechecked that stack because I selfishly want to see integration tests work there. Will approve once I get taht data back | 00:06 |
clarkb | pabelanger: I mean assuming the tests check out like initial code review does | 00:06 |
jeblair | SpamapS: looks like I654a269d37c0bc323ed73afa68a73ddd558be7e2 | 00:06 |
SpamapS | indeed | 00:06 |
SpamapS | so, the problem is, there are some tests that don't have any builds | 00:06 |
SpamapS | because they're testing the queue | 00:07 |
SpamapS | so they just have a gear.Job | 00:07 |
SpamapS | and need to go ahead and load the json, the same way that function did | 00:07 |
jeblair | SpamapS: have an example handy? | 00:08 |
SpamapS | jeblair: test_failed_change_at_head_with_queue | 00:08 |
SpamapS | specifically asserts that there are no builds yet | 00:08 |
SpamapS | I think it may be worth writing a new version of job_has_changes that is much smaller than the old one | 00:09 |
SpamapS | maybe call it queue_job_has_changes to differentiate | 00:09 |
SpamapS | jeblair: and I understand that the things expected to be in those jobs changes.. like the name, and some of the content | 00:14 |
jeblair | SpamapS: yeah, i think we pass along 'items' as a parameter to the build function, so you should be able to inspect that | 00:14 |
jeblair | SpamapS: (items contains the instructions on what changes to include) | 00:15 |
jeblair | SpamapS: yes, the _gearman_ job name is no longer interesting (it is simply 'build'), so you'll want to look for the 'job' argument of the job. | 00:17 |
jeblair | in clase that was clear as mud | 00:17 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_job_end_event test https://review.openstack.org/398493 | 00:18 |
SpamapS | indeed, I had moved past that one | 00:18 |
SpamapS | ok yeah items is what I need | 00:18 |
jeblair | SpamapS: the gearman job named 'build' has arguments telling the launcher what to run. one of those arguments is 'job' which is the zuul job name (eg, 'project-test1'). another argument is 'items' which contains the list of changes to be merged for the test. | 00:18 |
SpamapS | This is actually the only place that uses job_has_changes with the gear job tho | 00:19 |
pabelanger | clarkb: sounds good | 00:19 |
jeblair | SpamapS: it may not be worth it then | 00:19 |
SpamapS | Yeah exactly.. just going to do the extraction there | 00:19 |
jeblair | SpamapS: also, that seems like a slightly extraneous check. you could probably drop it entirely and i don't think we'd miss it. | 00:20 |
jeblair | (that single specific assertion, i mean) | 00:21 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test_job_auto_hold_* tests https://review.openstack.org/398495 | 00:21 |
openstackgerrit | Merged openstack-infra/nodepool: Remove SnapshotImage from the database https://review.openstack.org/398602 | 00:26 |
pabelanger | clarkb: excellent, all green! | 00:27 |
clarkb | yup | 00:28 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Delete test_handle_dib_build_gear_disconnect test https://review.openstack.org/398655 | 00:29 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Re-enable test_failed_change_at_head_with_queue https://review.openstack.org/398656 | 00:30 |
SpamapS | jeblair: ^ another one bites the dust | 00:30 |
jeblair | pow! | 00:30 |
openstackgerrit | Merged openstack-infra/nodepool: Replace snap_image with cloud_image https://review.openstack.org/398638 | 00:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Delete image related tests from test_nodepool https://review.openstack.org/398657 | 00:32 |
pabelanger | jeblair: clarkb: okay, last tests from test_nodepool that were skipped. | 00:33 |
pabelanger | ^ | 00:33 |
jeblair | pabelanger: do those first 2 tests have equivalents in test_builder? | 00:37 |
jeblair | or, i guess test_command | 00:38 |
clarkb | I was going to ask, maybe we should at least stub out the things we need todo in the other places? | 00:38 |
clarkb | (help us to not forget) | 00:38 |
jeblair | well, if they aren't covered, i'd say we shouldn't land that change -- it should have the add + delete. | 00:38 |
jeblair | but they may be covered | 00:39 |
clarkb | I am good with that too | 00:39 |
openstackgerrit | Merged openstack-infra/zuul: Re-enable test_failed_change_at_head_with_queue https://review.openstack.org/398656 | 00:39 |
pabelanger | jeblair: not yet, but I can work on that tomorrow morning for test_builder | 00:40 |
pabelanger | then add them to the current patch | 00:40 |
clarkb | pabelanger: maybe wip the change in the mean time? | 00:41 |
clarkb | does https://review.openstack.org/#/c/398655/1/nodepool/tests/test_nodepool.py have a semi similar "test builder losing connectivity to zk" type test? | 00:41 |
pabelanger | clarkb: yup, did already | 00:41 |
pabelanger | clarkb: same, I'll get a replacement up too | 00:42 |
pabelanger | I'll WIP | 00:42 |
clarkb | thanks | 00:42 |
jeblair | i'm inclined to think that the cli tests are probably sufficient for the image delete tests | 00:42 |
jeblair | but if someone wants to argue we need additional tests for that, i'll listen :) | 00:42 |
clarkb | jeblair: well I think earlier they were saying that the cli tests should only check that zk gets updated and we need to test the background deletion separately? If thats teh case I don't think the cli tests are sufficient | 00:43 |
clarkb | (I may have misparsed that conversation) | 00:43 |
pabelanger | I think it would be good to check the disk to see if the image is actually removed, for a unit test. | 00:45 |
jeblair | clarkb: well, that would be one approach. if we went that way, then yes, we would need a second test to test the actual deletion. however, i would suggest that a cli test that verified the deletion happened would be good. | 00:45 |
clarkb | jeblair: ya I think master's cli tests work that way | 00:45 |
clarkb | so happy to set it up in that manner still | 00:45 |
jeblair | (also, i think it would be extra work to build the test the other way -- you would have to shut down the builder before you ran the command to prevent it racing) | 00:47 |
jeblair | clarkb, pabelanger: so i have +1d 657 with that comment. | 00:48 |
clarkb | ok | 00:48 |
clarkb | jeblair: re morgan_'s comment in -infra, is ZUUL_CHANGE expected to stick around long term? | 00:48 |
jeblair | the zk disconnect is a good question. it's not a 1:1 mapping here... | 00:48 |
pabelanger | jeblair: ack | 00:48 |
clarkb | jeblair: ya I think the rough mapping is you restart the builder in the middle of a build, does it properly update zk on startup | 00:49 |
jeblair | (since that was testing a gearman disconnect while a running gearman job handle was still open) | 00:49 |
jeblair | but certainly the idea of testing zk disconnects is a good one | 00:49 |
jeblair | clarkb: and yeah, that's the most likely case and one we should start with | 00:50 |
clarkb | what that test is guarding against is that we update databse state properly when things get restarted | 00:50 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Separate image upload logs into separate logger https://review.openstack.org/397516 | 00:52 |
*** Shuo_ has quit IRC | 00:56 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove image delete tests from test_nodepool https://review.openstack.org/398657 | 01:02 |
pabelanger | jeblair: clarkb: So, https://review.openstack.org/#/c/398418/ test_dib_image_delete will check the state if the images. So, that gives us coverage for the tests removed in test_nodepool | 01:02 |
pabelanger | I'd still like to add a follow up test to make sure the image is removed from disk | 01:02 |
jeblair | pabelanger: that test will race the builder to actually delete it. so we should either modify that test to wait for the deletion to happen, or stop the builder before running the command (and then probably start it again and wait for deletion) | 01:27 |
*** bhavik1 has joined #zuul | 04:32 | |
*** abregman has joined #zuul | 06:23 | |
*** Cibo has joined #zuul | 06:41 | |
*** bhavik1 has quit IRC | 08:27 | |
*** abregman has quit IRC | 08:28 | |
*** abregman has joined #zuul | 08:30 | |
*** abregman has quit IRC | 08:40 | |
*** abregman has joined #zuul | 08:53 | |
*** hashar has joined #zuul | 09:34 | |
*** openstackgerrit has quit IRC | 09:48 | |
*** openstackgerrit has joined #zuul | 09:49 | |
*** Cibo has quit IRC | 10:07 | |
*** hashar is now known as hasharLunch | 11:33 | |
*** hasharLunch is now known as hasharAway | 12:29 | |
*** abregman has quit IRC | 13:43 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_config_validate https://review.openstack.org/398427 | 13:53 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_build https://review.openstack.org/398437 | 13:58 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete https://review.openstack.org/398440 | 14:01 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Remove test: test_snapshot_image_update https://review.openstack.org/398445 | 14:01 |
SpamapS | Shrews: know anyting about 'irrelevant-files' replacing 'skip-if' ? | 14:22 |
SpamapS | I'm struggling to see how the new vision handles different irrelevant-files in different projects. | 14:23 |
Shrews | SpamapS: i do not understand your words, so I'm going to go with "no" | 14:23 |
SpamapS | Shrews: apparently 'irrelevant-files' has replaced 'skip-if' | 14:23 |
SpamapS | but I'm not sure I understand how that's supposed to work to define per-job per-project skips | 14:24 |
Shrews | SpamapS: ah, a zuul change, I assume. I have been too engrossed with nodepool to follow much of the zuul changes | 14:27 |
SpamapS | well you're no help then | 14:28 |
Shrews | this is often true | 14:31 |
Shrews | pabelanger: any chance nodepool.tests.test_nodepool.TestNodepool.test_node_vhd_image_StringException was one of the new tests you enabled? | 14:32 |
Shrews | pabelanger: http://logs.openstack.org/37/398437/3/check/gate-nodepool-python27-db-ubuntu-xenial/0087592/testr_results.html.gz | 14:32 |
Shrews | err, minus the "_StringException" there | 14:33 |
Shrews | ah, yep | 14:34 |
SpamapS | mordred jeblair see above.. I am looking at the change from skip-if to irrelevant-files. In the old test_parse_skip_if test, we test a situation where one job is fed by two projects, with different skip-if's in each project. I don't see how that's possible with irrelevant-files. | 14:38 |
mordred | SpamapS: I am also no help | 14:44 |
SpamapS | mordred: crikey mate. | 14:45 |
mordred | SpamapS: I do have a cup of coffee though - so I might become useful once I've consumed it | 14:45 |
SpamapS | mordred: yeah, I'm interested in understanding how integration jobs work in the new world order. But I hear the stompy-stomp of little feat overhead, so it's time to leave the basement and tend to the little ones for a while. | 14:48 |
mordred | SpamapS: stompy stompy! | 14:48 |
olaph | stompy, indeed | 15:02 |
pabelanger | Shrews: Yes, that was enabled yesterday. Looks like it is flapping? | 15:08 |
Shrews | pabelanger: we'll have to keep an eye on it | 15:09 |
pabelanger | Shrews: Oh, so this is a long bug that clarkb mentioned yesterday. We try to drop the database on clean up, but it fails. For now, we can recheck but we should keep track of the failures, so we can revisit to debug | 15:09 |
mordred | pabelanger, Shrews: it's also worth noting that the db is going away eventually, so unless it gets flappity flappity, it may not be worth solving | 15:10 |
mordred | but yes on keeping track of it | 15:11 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 15:34 |
jeblair | SpamapS: if an irrelevant files should apply to all projects, it should be placed on the bare job definition. if it only applies to one project, then it should be placed on the job variant definition attached to the project-pipeline. | 15:48 |
jeblair | SpamapS: eg: http://paste.openstack.org/show/589613/ | 15:50 |
*** abregman has joined #zuul | 16:06 | |
*** abregman has quit IRC | 16:06 | |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test: test_config_validate https://review.openstack.org/398427 | 16:13 |
SpamapS | jeblair: what about a job that is triggered by two different projects, each of which need a different set of "irrelevant-files" defined. | 17:04 |
SpamapS | jeblair: so, like, an integration job in which nova wants to ignore 'contrib/*' but keystone actually wants to run if files in contrib/* change. | 17:05 |
jeblair | SpamapS: they'll each have a job variant defined on their project-pipeline | 17:05 |
SpamapS | Oh those are no longer the same job? | 17:05 |
jeblair | SpamapS: 'variant' is a term of art here meaning 'the same job but with different preconditions' | 17:06 |
jeblair | SpamapS: so the job is still named 'foo' in both cases, and the content of the job is the same, but the decision about whether and when to run the job is different in a variant | 17:07 |
jeblair | SpamapS: (variants can also alter the behavior by selecting different node types, or set different ansible variables) | 17:07 |
SpamapS | jeblair: Ok, I want to update the docs as I refactor the tests.. so I am trying to fully understand. | 17:08 |
SpamapS | And I truly do not at this point. | 17:08 |
SpamapS | but that's probably because I haven't thought about it for very long. | 17:08 |
* SpamapS will fake it, type something up, and we can edit together in review | 17:09 | |
jeblair | SpamapS: okay. i've avoided updating the docs so far because it's still quite a bit in flux (not all of the behaviors are fully defined yet), and honestly, the docs need a rewrite anyway. but we can certainly start to put some words down. :) | 17:11 |
jeblair | SpamapS: there's some discussion of this in the v3 spec here: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#jobs | 17:12 |
jeblair | (though i used the word 'aspects' instead of 'variant' there. i'm open to which is the better term. :) | 17:12 |
SpamapS | jeblair: Well for me, if something is totally different, like this... I need some prose. | 17:12 |
SpamapS | Doesn't have to be in docs/* | 17:12 |
SpamapS | I could put it in the test description. | 17:12 |
pabelanger | jeblair: Shrews: Just a heads up, running into some issues validating an image is delete by nodepool-builder: http://paste.openstack.org/show/589624/ This is from testtools.run nodepool.tests.test_commands.TestNodepoolCMD.test_dib_image_delete, only change I made was to wait for a non-existent image to come online. Going to be working on this for the next little while | 17:13 |
SpamapS | jeblair: makes me quite nervous that such a fundamental thing is still considered to be in flux. | 17:13 |
SpamapS | I'd rather we be wrong at this point. | 17:13 |
SpamapS | than be confused. :-P | 17:13 |
rbergeron | (thinking out loud) -- I was just mailing Will Thames, the fine maintainer of ansible-lint who just happens to work for Shadowman in IT / engineering services ... was thinking maybe ansible-lint might be a useful thing to run on the jobs/playbooks themselves when they are written / submitted. | 17:16 |
rbergeron | i know it's a ways off and for all i know this could already be happening anyway on other things, but... figured i'd mention it :) | 17:16 |
rbergeron | </squirrel> | 17:18 |
pabelanger | rbergeron: Yup, some projects already do that today. The Ansible Openstack team has some jobs already in place for that, I think ansible-role-cloud-launcher too. I also do it for some playbooks / roles I am hosting upstream | 17:19 |
rbergeron | pabelanger: awesomesauce. was just thinking it might be useful when all the jobs are ansibley. :) | 17:21 |
pabelanger | ++ | 17:22 |
SpamapS | rbergeron: thank you, I didn't know there was a well maintained ansible-lint ... I have many uses for such a thing. | 17:22 |
gundalow | https://github.com/willthames/ansible-review & https://github.com/willthames/ansible-lint | 17:23 |
rbergeron | yeah, those. gundalow on the ball :) | 17:23 |
pabelanger | I haven't done ansible-review yet | 17:23 |
pabelanger | but ansible-lint is great | 17:23 |
gundalow | ditto | 17:23 |
rbergeron | SpamapS: will came out to ansiblefest in london last year; he was on our list of folks to sponsor and we were like, "let's see who he works for" and ... oh hey, he works here! it was pretty amusing. :) | 17:26 |
openstackgerrit | Merged openstack-infra/nodepool: Log image delete exception as exception https://review.openstack.org/398596 | 17:31 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 17:42 |
pabelanger | Shrews: have a min? | 17:57 |
pabelanger | trying to figure out why I am getting an exception calling zk.deleteBuild() | 17:58 |
pabelanger | http://paste.openstack.org/show/589629/ | 17:58 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 17:58 |
Shrews | pabelanger: not atm. Out at lunch waiting for someone | 17:58 |
Shrews | pabelanger: what nodes are left under that zk build path? | 18:00 |
Shrews | pabelanger: you'll need to use zk-shell to look probably | 18:00 |
* Shrews wonders if a lock node remains | 18:01 | |
pabelanger | Shrews: http://paste.openstack.org/show/589630/ | 18:02 |
pabelanger | ya, looks like a lock | 18:02 |
pabelanger | trying to delete build 0000000001 | 18:03 |
pabelanger | let see if I can find it | 18:03 |
Shrews | pabelanger: OK, then that's a logic error in the code | 18:03 |
pabelanger | Shrews: thanks | 18:04 |
Shrews | Can think of a couple of ways to deal with that, but will need to wait until I get back to my computer | 18:04 |
pabelanger | sure, np. Enjoy lunch | 18:04 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests https://review.openstack.org/399172 | 18:19 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder https://review.openstack.org/399173 | 18:19 |
jeblair | pabelanger, Shrews: ^ that works. i'm going to work on the corresponding provider removal test now, which we can use to explore the concerns we had yesterday about deleting uploaded images when we remove a provider | 18:20 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 18:24 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 18:26 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399177 | 18:33 |
pabelanger | jeblair: Shrews: what do you think about adding waitForBuildDelete() (name open to bikeshed) as a helper function to watch the filesystem to make sure our builds are actually deleted? | 18:48 |
jeblair | pabelanger: maybe it can check zk and the filesystem? but yes. | 18:54 |
pabelanger | k | 18:57 |
Shrews | jeblair: me likey | 19:22 |
*** hasharAway is now known as hashar | 19:22 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk https://review.openstack.org/399196 | 19:25 |
pabelanger | jeblair: Shrews: This is what I've come up with so far to check that builds are removed from disk, however the test currently fails | 19:25 |
pabelanger | because getBuild() isn't empty and deleteBuild() throws an exception (still has a lock) | 19:26 |
pabelanger | want to make sure I'm on the right path before digging into the lock | 19:26 |
Shrews | pabelanger: well, that is slightly racey to begin with (deleting ondisk image and the zk node are not atomic) | 19:32 |
Shrews | but should rarely hit that, i would think (famous last words) | 19:33 |
Shrews | pabelanger: so, 2 thoughts: we could do the recursive thing you brought up yesterday, which would solve that problem. But that opens the possibility of us accidentally deleting the upload data if we haven't properly done that aspect yet. | 19:34 |
*** abregman has joined #zuul | 19:34 | |
Shrews | pabelanger: 2nd thing, we could manually delete the lock after all uploads have been deleted | 19:34 |
pabelanger | ya, let me see where we are missing the lock before we do recursive, I think that is worth the effort | 19:35 |
Shrews | the second would involve deleting the locks for each provider | 19:35 |
Shrews | actually, the entire "provider/XYZ/images" path | 19:36 |
Shrews | pabelanger: ooh, 3rd option | 19:37 |
Shrews | put a check in to deleteBuild() that does the verification that no uploads exist. if they do, throw an exception. if not, delete recursively | 19:37 |
Shrews | i like #3 | 19:37 |
Shrews | pabelanger: i can code that up real quick.. gimme a min | 19:39 |
pabelanger | okay | 19:39 |
Shrews | pabelanger: unless you were eager to do that? | 19:39 |
pabelanger | Shrews: I'll have to step away shortly for a meeting, so I can do that when I get back. Or happy to let you do so | 19:40 |
Shrews | pabelanger: i'll toss the code up and rebase your test on top of it | 19:41 |
pabelanger | WFM | 19:41 |
*** openstackgerrit has quit IRC | 19:48 | |
*** openstackgerrit has joined #zuul | 19:49 | |
jeblair | pabelanger, Shrews: cool -- i think i'm very close to having the remove provider test working, which does require some changes in deleting. i think i can push that up after my lunch. so hopefully around then we can look at both that and the lock issue together (since they are all around the same bit of code). | 19:56 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk https://review.openstack.org/399196 | 20:03 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 20:03 |
Shrews | pabelanger: jeblair: ^^^ | 20:03 |
Shrews | crossing my fingers pabelanger's new test passes now | 20:03 |
Shrews | gah. wrong log level | 20:05 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk https://review.openstack.org/399196 | 20:07 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 20:07 |
*** abregman is now known as abregman|afk | 20:08 | |
Shrews | pabelanger: your test passes now! i had a stupid pep8 failure to get the -1 from jenkins though | 20:35 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 20:35 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk https://review.openstack.org/399196 | 20:36 |
* Shrews needs to alias 'git review' to 'tox -epep8 ; git review' | 20:37 | |
timrc | I use vim-spf13 and it auto-includes an inline pep8 check which is handy. | 20:43 |
timrc | Shrews: But would you want tox -e pep8 && git review so it short circuits? | 20:44 |
pabelanger | and back | 20:44 |
Shrews | timrc: it was a half-jest, not subject to checks for accuratenesss :) | 20:44 |
* timrc crawls back into his hole | 20:45 | |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder https://review.openstack.org/399173 | 20:45 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests https://review.openstack.org/399172 | 20:46 |
pabelanger | Shrews: Yay, that is awesome | 20:46 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images https://review.openstack.org/399230 | 20:46 |
jeblair | pabelanger, Shrews: okay, that's there ^ (the other 2 are rebased on branch tip). i'll look at what you've been up to now. :) | 20:47 |
Shrews | jeblair: looking | 20:48 |
Shrews | jeblair: i think you need to rebase on 398437 | 20:50 |
Shrews | because that assertEqual() of the images won't really work w/o it | 20:50 |
Shrews | https://review.openstack.org/#/c/399230/1/nodepool/tests/test_builder.py | 20:51 |
jeblair | Shrews, pabelanger: two questions: 1) would it be safer to just delete the lock right before the build delete? 2) don't we need to continue to trap NoNodeError? | 20:51 |
Shrews | jeblair: 1) there would be a lock for each provider, 2) no, b/c i added an exists() check above | 20:52 |
jeblair | Shrews: but this could be racing with other builders, right? | 20:52 |
Shrews | jeblair: hmm, fair point | 20:54 |
Shrews | as far as point #2. i don't think #1 makes a difference, does it? | 20:54 |
jeblair | Shrews: yeah, i think we can ignore #1 for now. i lean toward needing #2 because of other builders. | 20:56 |
jeblair | Shrews: re 437 -- hrm, yes i was counting on it and thought it merged. now i don't know why the upload deletion works at all (i changed it to use imageupload.build_id) | 20:57 |
Shrews | yeah. actually, that exists check added is not needed | 20:57 |
Shrews | jeblair: also, i just realized that __eq__ doesn't take provider into account, so we could get some false equality there | 20:58 |
jeblair | Shrews: oh, right! 437 only adds eq. build_id *did* already merge. | 20:58 |
jeblair | so, yeah, that all makes sense. i will rebase on 437 and improve it. | 20:58 |
Shrews | ++ | 20:58 |
pabelanger | Shrews: Haha, we need to update test_dib_image_delete now because deleted won't be in the output. Which now make sense based on the previous patch | 20:59 |
pabelanger | http://logs.openstack.org/96/399196/4/check/gate-nodepool-python27-db-ubuntu-xenial/bcf992a/testr_results.html.gz | 20:59 |
jeblair | wow, there are *two* ...437 changes in my zuulv3 dashboard :) | 20:59 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 21:00 |
jeblair | pabelanger: can we paint the bikeshed 'waitforBuildDeletion'? | 21:00 |
pabelanger | WFM | 21:00 |
jeblair | pabelanger: i just wrote 'waitforUploadDeletion'. reckon they should match. | 21:01 |
pabelanger | Yup | 21:01 |
jeblair | er waitForBuildDeletion/waitForImageDeletion. | 21:01 |
Shrews | pabelanger: we need to add a wait there in test_dib_image_delete, since that, too, is racey | 21:01 |
pabelanger | ya | 21:02 |
Shrews | this is why i didn't tackle any more test_commands tests. lots of moving targets atm | 21:03 |
Shrews | wheeee | 21:03 |
jeblair | pabelanger: we're going to need to reconcile 399196 and 399230 -- we both touched _cleanupProvider. | 21:03 |
Shrews | jeblair: are you adding provider to __eq__ in your improvement? | 21:03 |
pabelanger | jeblair: Ya, I can rebase a top of yours | 21:03 |
jeblair | Shrews: yes, i was planning on doing that | 21:04 |
Shrews | ossum | 21:04 |
jeblair | pabelanger: i normalized to passing in a provider object. we can switch mine to accept provider_name, but is there a reason you went with that? | 21:05 |
jeblair | pabelanger: oh, i see another thing we can converge on -- i saved the images dir on the test object because i needed that for the configuration reload. you saved the nodepool builder object and used the imagesdir attribute on its config. | 21:06 |
jeblair | pabelanger: we could use either approach to solve both problems | 21:07 |
pabelanger | jeblair: I found we only used the provider_name, and there was a 2 bugs where we just passed provider. So switching it to provider_name seems to make sense. However, don't mind reverting to original way and using provider.name | 21:07 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 21:08 |
jeblair | pabelanger: i have a *slight* preference for my approach of saving the directory since it involves less reaching into the nodepool builder. (i mean, i think that's fine to do, but in this case, i think it's easy enough to do without reaching into the ._config object). | 21:08 |
pabelanger | jeblair: sure, lets use the original way | 21:09 |
pabelanger | and use your directory too | 21:09 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 21:09 |
jeblair | pabelanger: give me a second before you rebase | 21:10 |
pabelanger | k | 21:10 |
jeblair | pabelanger: i'm rebasing and have a conflict | 21:10 |
openstackgerrit | Merged openstack-infra/nodepool: Catch all upload exceptions https://review.openstack.org/398148 | 21:10 |
jeblair | oh let's do this | 21:10 |
jeblair | pabelanger: can you +3 398437 ? | 21:11 |
jeblair | i need that to be merged with branch tip before i can do this | 21:11 |
pabelanger | done | 21:13 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Fix getMostRecentImageUpload return docstring https://review.openstack.org/399243 | 21:13 |
Shrews | ^^^ fixes a suspected copy-pasta | 21:13 |
jhesketh | Morning | 21:23 |
pabelanger | o/ | 21:24 |
Shrews | darn you jenkins | 21:25 |
Shrews | actually, darn you mysql | 21:25 |
Shrews | pabelanger: 437 hit the db timeout again | 21:25 |
Shrews | rechecked | 21:27 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload https://review.openstack.org/399250 | 21:28 |
pabelanger | Shrews: :( | 21:28 |
jeblair | that's problematic.... | 21:29 |
jeblair | pabelanger, Shrews: i'm going to rebase 437 on branch tip | 21:29 |
pabelanger | ack | 21:29 |
jeblair | since we've hit the timeout, the cost is sunk already, and this will let us restack everything | 21:29 |
Shrews | ai'ight | 21:29 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Re-enable test: test_image_build https://review.openstack.org/398437 | 21:30 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete https://review.openstack.org/398440 | 21:31 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload https://review.openstack.org/399250 | 21:31 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Remove test: test_snapshot_image_update https://review.openstack.org/398445 | 21:31 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder https://review.openstack.org/399173 | 21:31 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests https://review.openstack.org/399172 | 21:31 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images https://review.openstack.org/399230 | 21:32 |
jeblair | pabelanger: okay i think you can rebase your changes on that | 21:33 |
jeblair | pabelanger: i think that would include 213 and 230, right? | 21:33 |
pabelanger | 213 and 196 | 21:34 |
jeblair | yep, that's right, sorry. | 21:34 |
Shrews | jeblair: comments on 399250 | 21:35 |
jeblair | Shrews: rgr will do | 21:35 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images https://review.openstack.org/399230 | 21:37 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload https://review.openstack.org/399250 | 21:37 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder https://review.openstack.org/399173 | 21:37 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests https://review.openstack.org/399172 | 21:37 |
jeblair | pabelanger: sorry, one more rebase there ^ | 21:37 |
pabelanger | np | 21:37 |
*** jamielennox is now known as jamielennox|away | 21:40 | |
jeblair | i'll be back in 15 | 21:41 |
openstackgerrit | Merged openstack-infra/nodepool: Fix getMostRecentImageUpload return docstring https://review.openstack.org/399243 | 21:42 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened https://review.openstack.org/399196 | 21:56 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 21:56 |
pabelanger | jeblair: Shrews: okay, I think that fixes our race condition and confirms our build has been removed | 21:56 |
clarkb | ok https://github.com/ansible/ansible/issues/18281 now has a pull request and more thoughts on the problem based on what I have learned | 21:57 |
Shrews | pabelanger: is that assert_listed() the equivalent of "not listed" | 22:00 |
pabelanger | Shrews: ya, is there a better way to do that? | 22:00 |
Shrews | ah, just saw the comment. nah, just seems odd to with the method name: "assert_listed, but not really" | 22:01 |
Shrews | "odd to with" ?? me no speaky goodly | 22:02 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened https://review.openstack.org/399196 | 22:02 |
Shrews | 399250 hit the db timeout again. this is too problematic. we'll have to fix that, i believe | 22:06 |
*** lindycoder has joined #zuul | 22:07 | |
mordred | Shrews: yah. that's :( but I agree | 22:07 |
Shrews | has anyone looked into it before? | 22:08 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable test: test_image_build https://review.openstack.org/398437 | 22:08 |
openstackgerrit | Merged openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete https://review.openstack.org/398440 | 22:08 |
jeblair | clarkb: ^ what do you know about the db timeout thing? | 22:09 |
openstackgerrit | Merged openstack-infra/nodepool: Remove test: test_snapshot_image_update https://review.openstack.org/398445 | 22:09 |
jeblair | it looks like a deadlock | 22:09 |
clarkb | jeblair: https://review.openstack.org/#/c/280989/ is the stack where I was trying to work it out | 22:10 |
jeblair | 'drop database' is running but sitting there until the test timeout | 22:10 |
clarkb | jeblair: the parent of that change tries to dump a bunch of extra mysql debugging data but none of the data I was able to collect amde it clear why that was happening | 22:10 |
jeblair | (after the actual test ran and completed in a few seconds) | 22:10 |
clarkb | and yes I think its somethign else holding open the db | 22:10 |
clarkb | maybe a stray thread or similar | 22:10 |
clarkb | there are probably different and more better things that could be added to https://review.openstack.org/#/c/278146/7/nodepool/tests/__init__.py that would help more | 22:11 |
Shrews | do the tests create a schema for each test? or is a single schema shared among them? | 22:11 |
clarkb | but my familiarity with debugging mysql deadlocks is minimal | 22:11 |
clarkb | Shrews: schema per test | 22:11 |
Shrews | well that's very odd then | 22:11 |
clarkb | Shrews: and at the end of each test it attempts to drop the schema | 22:11 |
clarkb | whcih is what is hanging | 22:11 |
clarkb | jeblair: I think I also tested that if I set the test timeout to some ridiculous number it never actually manages to drop the db | 22:12 |
clarkb | jeblair: pretty sure I did that locally | 22:12 |
jeblair | i wonder how well mysql is tested/protected against deadlocks in drop database (it's possible we're the only people who tried to drop 8 schemas at the same time?) | 22:12 |
clarkb | let it sit there for an hour before deciding it wasn't going to happen | 22:12 |
clarkb | jeblair: we do have a knack for finding all the bugs | 22:13 |
jeblair | unfortunately, all the runners are difference processes, so it would not be easy to serialize across them... :/ | 22:14 |
jeblair | different | 22:14 |
jeblair | clarkb, Shrews: we (ugh) could (maybe) try to prefix the table names instead of per-test schemas. | 22:15 |
clarkb | or do schema per process and reduce contention on drop dbs | 22:16 |
Shrews | jeblair: or use sqlite! | 22:16 |
Shrews | mordred: ya know, we wouldn't have this problem if mysql supported transactions | 22:16 |
* Shrews hides | 22:16 | |
jeblair | Shrews: nodepool does not even *remotely* think about working on sqlite. :) | 22:17 |
jeblair | (we used to. it did not last long) | 22:17 |
mordred | Shrews: yah. adding transaction support sounds great | 22:17 |
jeblair | (nodepool has ~1000 concurrent open connections to mysql, each writing to one row in the same table) | 22:18 |
Shrews | my bet is on innodb doing some processing that's taking a while | 22:18 |
jeblair | Shrews: but more than an hour? | 22:19 |
Shrews | an hour? oh | 22:19 |
jeblair | Shrews: yeah, clarkb let this run locally for an hour once | 22:19 |
clarkb | its reproduceable locally | 22:20 |
clarkb | I think I set the timeout var to big number then did while tox -epy27 | 22:20 |
clarkb | and ya it sat there for a long time once I hit it | 22:20 |
openstackgerrit | Martin Roy proposed openstack-infra/zuul: Add a mutex release upon cancelling a job via gearman https://review.openstack.org/399274 | 22:21 |
lindycoder | ^ i'm not sure how to tag this as resolving this : https://storyboard.openstack.org/#!/story/2000657 | 22:22 |
pabelanger | jeblair: looks like 399230 is missing node_two_provider.yaml | 22:23 |
clarkb | jeblair: Shrews mordred I definitely felt like I got in over my head once the first set of basic debugging didn't lead anywhere. Some of you are mysql experts, I am not one of them :) | 22:24 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Add a mutex release upon cancelling a job via gearman https://review.openstack.org/399274 | 22:24 |
jeblair | lindycoder: ^ like that! and thanks! :) | 22:24 |
clarkb | but the issue also seemed to mostly go away | 22:24 |
*** jamielennox|away is now known as jamielennox | 22:24 | |
clarkb | wee timing races are the best | 22:24 |
lindycoder | jeblair, thanks! | 22:24 |
jeblair | and look, i did it right and the bot did its thing! https://storyboard.openstack.org/#!/story/2000657 | 22:25 |
lindycoder | amazing! | 22:25 |
jeblair | pabelanger: oh, node_two_provider_remove | 22:26 |
pabelanger | Oh, yes | 22:27 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened https://review.openstack.org/399196 | 22:28 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images https://review.openstack.org/399230 | 22:28 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add check for uploads to deleteBuild() https://review.openstack.org/399213 | 22:28 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload https://review.openstack.org/399250 | 22:28 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder https://review.openstack.org/399173 | 22:28 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests https://review.openstack.org/399172 | 22:28 |
jeblair | pabelanger: fixed, rebased whole stack | 22:28 |
pabelanger | Yay | 22:28 |
pabelanger | danke | 22:28 |
lindycoder | jeblair, tests passes, do i need to add anything to the PR to your opinion? | 22:35 |
jeblair | lindycoder: nope, should be all set. i just need to spend a few minutes thinking about it with a fresh mind (mutexes are hard) | 22:36 |
*** hashar has quit IRC | 22:37 | |
lindycoder | Sure thing, an alternative to that would be to keep the mutex handling logic in the scheduler and have gearman call a new onBuildCancelled event that manages that | 22:41 |
SpamapS | adam_g: make sure to check https://storyboard.openstack.org/#!/story/2000773 before you pick up a test. It's been kept pretty up to date | 22:41 |
SpamapS | Shrews: innodb doing things slow? | 22:42 |
SpamapS | Shrews: not possible! :) | 22:42 |
lindycoder | sorry i gotta leave for today i'll be back tomorrow | 22:43 |
*** lindycoder has quit IRC | 22:44 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove image delete tests from test_nodepool https://review.openstack.org/398657 | 22:44 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Use a table prefix instead of per-test schema https://review.openstack.org/399286 | 22:48 |
jeblair | Shrews, clarkb, mordred: <shrug> ^ | 22:49 |
openstackgerrit | Merged openstack-infra/nodepool: Add provider_name to ImageUpload https://review.openstack.org/399250 | 23:00 |
openstackgerrit | Merged openstack-infra/nodepool: Add printZKTree debug method for tests https://review.openstack.org/399172 | 23:01 |
*** willthames has joined #zuul | 23:13 | |
openstackgerrit | Merged openstack-infra/nodepool: Add test_provider_addition to builder https://review.openstack.org/399173 | 23:17 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Use a table prefix instead of per-test schema https://review.openstack.org/399286 | 23:28 |
jeblair | okay, that should actually pass tests now. the docs for this method are an interesting read: http://docs.sqlalchemy.org/en/latest/orm/mapping_api.html#sqlalchemy.orm.clear_mappers | 23:29 |
jeblair | clarkb, mordred: ^ | 23:29 |
mordred | jeblair: interesting | 23:38 |
jeblair | mordred, clarkb, Shrews: i thought of an alternative to that, if we want to keep the schemas -- we could have a well-known lockfile used by the tests. so the cleanup method locks the lockfile on disk first before executing the drop schema. [we could probably do something similar in the database itself, actually]. | 23:39 |
jeblair | honestly, i don't care too much since it should all be going away in the not-too-distant future | 23:40 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add image addition test to builder https://review.openstack.org/399294 | 23:41 |
mordred | jeblair: yah - that's my feeling too- it's going away soon, so whatever fixes it | 23:59 |
SpamapS | jeblair: so.. irrelevant-files ... | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!