openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul: Merge branch 'master' into workingv3 https://review.openstack.org/389470 | 00:18 |
---|---|---|
*** saneax-_-|AFK is now known as saneax | 00:24 | |
openstackgerrit | Adam Gandelman proposed openstack-infra/zuul: Re-enable test_success_pattern as test_success_url https://review.openstack.org/400455 | 00:46 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Correct logic problem with job trees https://review.openstack.org/400456 | 00:53 |
jeblair | SpamapS: ^ maybe we can take that and apply your additional tests to it | 00:54 |
clarkb | oh sorry for mic drop,larissa hauled me away | 01:21 |
SpamapS | jeblair: I'll look soon. Was sick this morning and now been driving the last 3 hours. :-p | 01:46 |
jeblair | clarkb: no it was perfect :) | 03:07 |
jeblair | SpamapS: no rush; get well. | 03:08 |
SpamapS | jeblair: seems to have been a 24hr bug that finally let go this afternoon. :-P | 03:09 |
*** saneax is now known as saneax-_-|AFK | 05:54 | |
*** abregman has joined #zuul | 06:23 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/nodepool: Merge branch 'master' into feature/zuulv3 https://review.openstack.org/400536 | 06:46 |
*** willthames has quit IRC | 06:50 | |
*** hashar has joined #zuul | 07:12 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 07:52 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399177 | 07:56 |
*** hogepodge has quit IRC | 08:45 | |
*** rcarrillocruz has quit IRC | 09:29 | |
*** ricky1 has joined #zuul | 09:33 | |
*** ricky1 is now known as rcarrillocruz | 09:33 | |
*** yolanda has quit IRC | 10:35 | |
*** yolanda has joined #zuul | 10:38 | |
*** rcarrillocruz has quit IRC | 11:46 | |
*** rcarrillocruz has joined #zuul | 11:46 | |
*** abregman is now known as abregman|mtg | 12:57 | |
*** yolanda has quit IRC | 12:58 | |
*** abregman|mtg has quit IRC | 13:02 | |
*** abregman has joined #zuul | 14:12 | |
pabelanger | morning | 14:13 |
pabelanger | looking at http://logs.openstack.org/34/399734/1/gate/gate-nodepool-python27-db-ubuntu-xenial/a9db8e9/console.html | 14:13 |
pabelanger | seems we are uploading the same build twice | 14:14 |
pabelanger | not sure if we recently fixed that or not | 14:14 |
pabelanger | will recheck now | 14:14 |
*** yolanda has joined #zuul | 14:17 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_hold test https://review.openstack.org/399758 | 14:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Enable test_delete / test_delete_now tests https://review.openstack.org/399734 | 14:17 |
Shrews | pabelanger: that was probably (hopefully?) fixed with my two bugs squashed yesterday | 14:21 |
Shrews | not sure if that review needs a rebase to get those or not | 14:23 |
*** yolanda has quit IRC | 14:36 | |
*** yolanda has joined #zuul | 14:43 | |
openstackgerrit | Merged openstack-infra/nodepool: Add tools/test-setup.sh https://review.openstack.org/399079 | 15:02 |
Shrews | umm, anyone know where getSnapshotImageByExternalID gets defined in nodepool? is that autogenerated from the db layer or something? | 15:03 |
Shrews | oh, it was removed in a previous commit. *phew* | 15:08 |
Shrews | jeblair: so, the alien-image-list command seemed to operate on snapshot images. since support for that was removed, can that command (and associated tests) be removed? | 15:10 |
Shrews | if not, what should it be doing instead at this point? | 15:11 |
jeblair | Shrews: i think it should be comparing the images it finds in the clouds to uploads | 15:11 |
jeblair | Shrews: (that was probably another case of us overloading the term snapshot to also mean 'dib we uploaded' -- since dib was an add-on) | 15:11 |
*** yolanda has quit IRC | 15:12 | |
Shrews | jeblair: ok. that's what i *assumed* it should be trying to do, but the whole snapshot thing threw me off | 15:12 |
*** yolanda has joined #zuul | 15:52 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable alien-image-list command and tests https://review.openstack.org/400836 | 16:00 |
Shrews | jeblair: pabelanger: see my NOTE note in that review ^^^ | 16:01 |
jeblair | Shrews: ack, i'll take a look at that later | 16:04 |
*** yolanda has quit IRC | 16:04 | |
jeblair | (i mean, i'll take a look at seeing if there's a way to add the positive test case) | 16:07 |
*** yolanda has joined #zuul | 16:29 | |
*** abregman has quit IRC | 16:33 | |
*** abregman has joined #zuul | 16:33 | |
*** yolanda has quit IRC | 16:42 | |
*** yolanda has joined #zuul | 16:51 | |
*** hashar is now known as hasharAway | 16:57 | |
*** yolanda has quit IRC | 17:40 | |
*** abregman is now known as abregman|afk | 17:59 | |
*** yolanda has joined #zuul | 18:20 | |
phschwartz | jeblair: ok, so if you have some time. I am open right now. I basically want to throw out what I have been attempting because all feel really really bad and get your thoughts based on that original review that did part of the work of trying to turn it into a graph | 18:22 |
*** yolanda has quit IRC | 18:23 | |
*** yolanda has joined #zuul | 18:23 | |
jeblair | phschwartz: sure thing | 18:24 |
clarkb | Shrews: jeblair re alient list yup. Basically snapshot has/had two meanings 1) actual snapshot images and 2) dib uploads | 18:26 |
phschwartz | so from reading the commit message and the code, the point of the change was to fix the issue where 2 jobs had the same child but the child would run from which ever finished first instead of when both were done | 18:28 |
phschwartz | does that sound correct? | 18:28 |
jeblair | phschwartz: yes, (i'm not sure whether it was the one that finished first vs the one that was defined first -- but regardless, it is to express that we want A, B+C, then D to run when B and C are finished. | 18:30 |
phschwartz | So the previous review tried to move the jobTree to a jobGraph with some cyclic dependency protection, but only changed about 1/6th the code that uses jobTree's and in not an easy maner to replace. | 18:32 |
phschwartz | Do you think it is cleaner to continue down that process and replace all the jobTree usage with a graph like object that can be used expressively to list a job chain/order? | 18:33 |
phschwartz | I was attempting to fix it in the job tree itself and without adding extra objects to try and track cross dependent children it turned into a nightmare | 18:34 |
*** yolanda has quit IRC | 18:34 | |
jeblair | phschwartz: i'm not sure how best to answer your question. i think this is a fundamental alteration of jobtrees -- we are, in fact, changing them to a graph structure. so i think whatever investment in altering the data model to accomodate that clearly is worth it (i don't think we should try to shoehorn this into the existing jobtree structure if it doesn't fit). however, we do still need something that lays out the order in which jobs are ... | 18:37 |
jeblair | ... to be run, and can be used to easily answer the question "these jobs have completed, what jobs can be run now?" | 18:37 |
phschwartz | jeblair: ok, that gives me a clearer picture of exactly what you want. Instead of a normal open graph like the review that was presented, I am going to take a shot at moving the jobTree to a weighted or dependency graph. I think that will give us the best function for solving this. Do you have an issue with the format of the pipeline defn changing a bit to allow clearer reading while still using a yaml dict | 18:40 |
phschwartz | format? | 18:40 |
jeblair | phschwartz: indeed, i think even more specifically a DAG is what we want, yeah? | 18:42 |
phschwartz | jeblair: That is what I was leaning towards | 18:43 |
jeblair | phschwartz: and yeah, we can consider format changes -- maybe make some mockups in an etherpad and run them by me before you get too far into it? | 18:44 |
phschwartz | that was my thoughts on it | 18:45 |
*** yolanda has joined #zuul | 18:55 | |
*** yolanda has quit IRC | 18:59 | |
mordred | job DAG will make many people happy | 19:12 |
phschwartz | mordred: I think it will. | 19:14 |
*** yolanda has joined #zuul | 19:16 | |
*** hasharAway is now known as hashar | 19:22 | |
*** yolanda has quit IRC | 19:32 | |
*** yolanda has joined #zuul | 19:42 | |
*** yolanda has quit IRC | 19:54 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_list_nodes https://review.openstack.org/400955 | 20:32 |
Shrews | missed that one ^^^ | 20:33 |
*** abregman|afk has quit IRC | 20:40 | |
Shrews | there is also the test_image_upload_fail test, but i'm not quite sure how to fix that one yet | 20:45 |
pabelanger | +2 on test_list_nodes | 20:48 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable TestWebApp tests https://review.openstack.org/399716 | 21:01 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove waitForBuiltImages() / JobTracker() from nodepool.py https://review.openstack.org/399727 | 21:01 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add test to validate when a node build is disabled https://review.openstack.org/399642 | 21:01 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Re-enable test_hold test https://review.openstack.org/399758 | 21:04 |
pabelanger | Shrews: regarding: https://review.openstack.org/#/c/399974/ where is the build_number generated for builds today? If 0000000001 isn't guaranteed, is there no safe way to force it to be the default? | 21:12 |
Shrews | pabelanger: it's a sequence znode managed by zookeeper. we have no control over it | 21:13 |
pabelanger | Hmm | 21:15 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: [WIP] Re-enable test: test_image_upload_fail https://review.openstack.org/400970 | 21:16 |
Shrews | pabelanger: clarkb: jeblair: my testing for ^^^ shows that the fakeuploadfailcloud is not getting swapped in for that test (thus it fails). not sure what the issue there is but it must have worked before, yeah? | 21:18 |
pabelanger | Shrews: Ya, we make the assumption in a few places already with fake-image-0000000001, for example. So, we'll also need to clean that up too. test_dib_image_delete and test_image_removal | 21:19 |
Shrews | pabelanger: erm, didn't notice that | 21:19 |
Shrews | pabelanger: dib-image-delete looks like an easy fix. just use build.id there | 21:20 |
clarkb | Shrews: on line 111 make sure that object path is still the same path used now? | 21:21 |
Shrews | err, builds[0].id | 21:21 |
Shrews | clarkb: yup, the same. though i think the _get_one_cloud() path is different? | 21:22 |
jhesketh | Morning | 21:23 |
jeblair | Shrews: we haven't seen a test fail because the sequence started at 0 yet, have we? | 21:28 |
Shrews | jeblair: i saw it while developing zk.py | 21:28 |
Shrews | so, not a test failure, no | 21:28 |
jeblair | Shrews: maybe there was something special about the conditions when you saw that | 21:29 |
Shrews | jeblair: i didn't bother digging into the zookeeper code (java, bleh!) to figure the conditions | 21:29 |
jeblair | i confess, it bothers me that something that is guaranteed to be a monotonically increasing number would have an undefined behavior like that. and i like that we can use it to assert that the image we built is the *first* image we built. if we read the value, then all we can do is assert that the second image was the one that was built after the first. | 21:30 |
Shrews | jeblair: ZK makes no guarantees as far as the docs (https://zookeeper.apache.org/doc/r3.2.1/zookeeperProgrammers.html#Sequence+Nodes+--+Unique+Naming) and harlowja have informed me | 21:31 |
harlowja | ya, i think its really based on the modifications to the parent znode | 21:31 |
harlowja | afaik that's where the number comes from | 21:32 |
harlowja | delete the parent and maybe it will start at 0000000001 | 21:32 |
harlowja | i forget though | 21:32 |
harlowja | ha | 21:32 |
harlowja | check out the java code :-P | 21:32 |
Shrews | no, that's why i summoned you! lol | 21:34 |
* Shrews is allergic to java | 21:34 | |
harlowja | lol | 21:34 |
pabelanger | jeblair: agreed, it would be good to know what the first build ID would be for a new image. It would make writing tests easier :) | 21:36 |
pabelanger | that's actually was I was trying to do, some how prime a build number with storeBuild(), then delete that data. We'd know what the current build_id would be | 21:37 |
pabelanger | docs to reference 0000000001 | 21:40 |
pabelanger | and not 0000000000 | 21:40 |
harlowja | somewhere in https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/DataTree.java i think | 21:40 |
Shrews | pabelanger: as an example, yes | 21:40 |
Shrews | Even if we knew the conditions, we're not guaranteed that the zk devs wouldn't change them at some point | 21:43 |
harlowja | ya | 21:43 |
harlowja | somewhere in there, lol | 21:43 |
pabelanger | Shrews: jeblair: and was it the downside, to not having zookeeper generate the id? | 21:45 |
Shrews | i think that's even why i made zk.storeBuild() return the build number | 21:45 |
pabelanger | or upside, depending on how we look at it | 21:45 |
Shrews | pabelanger: it started out NOT using zk for the sequence numbers | 21:46 |
pabelanger | okay, so we switched to it. Was that discussion logged any place? | 21:47 |
pabelanger | mostly curious on the history of using it | 21:47 |
Shrews | pabelanger: see fac06108552b7482086d840dcfe5b2eaa1d11b36 | 21:47 |
pabelanger | k | 21:47 |
*** hashar has quit IRC | 21:49 | |
Shrews | the getMaxBuildID() used to be necessary to query ZK to calculate the _next_ ID. pretty inefficient | 21:49 |
Shrews | for one | 21:49 |
harlowja | u guys going to make your own sequence number stuff (transcationally?) | 21:50 |
Shrews | and i think that involved more locks | 21:50 |
harlowja | i wouldn't recomend it :-P | 21:50 |
jeblair | zk does have a test where they hardcod 0000000001 -- https://github.com/apache/zookeeper/blob/ec20c5434cc8a334b3fd25e27d26dccf4793c8f3/src/java/test/org/apache/zookeeper/test/AsyncOps.java#L806 | 21:50 |
harlowja | ya, more locks needed | 21:50 |
Shrews | nah, i don't want to create a new GUID | 21:50 |
pabelanger | jeblair: interesting | 21:51 |
jeblair | (well, separately from this discussion, we also have the potential desire to make global unique build and upload ids, which might involve a global sequence oracle, or a uuid; neither of those things solve this question/problem though) | 21:51 |
Shrews | well that's all well and good. doesn't mean 0000000000 could never be generated | 21:51 |
Shrews | if we want to assume 1 for the tests, we might get away with that for a long time. i'm just relating my experience. i really don't think non-test code should EVER depend on that though | 21:52 |
jeblair | Shrews: i agree with that. | 21:53 |
jeblair | Shrews: i'm just tending to think that it might be okay for tests (until it's not), and we get the benefit of it being easier to write/understand, plus a little extra assertiveness on the part of the test. | 21:53 |
jeblair | i love that i've spent 20 minutes trying to find the place in zk where it sets or increments the seqno and have not succeeded yet. | 21:54 |
harlowja | me too | 21:54 |
harlowja | lol | 21:54 |
harlowja | 10 minutes though | 21:54 |
harlowja | i have a feeling https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/util/ZxidUtils.java#L25-L27 might be used? | 21:55 |
harlowja | but not sure, ha | 21:55 |
harlowja | let me know if u find it :-P | 21:55 |
jeblair | i have exceeded the number of windows my wm will let me open | 21:56 |
harlowja | ha | 21:56 |
jeblair | i did not know there was a max | 21:56 |
jeblair | it's apparently 144 per desktop | 21:56 |
*** yolanda has joined #zuul | 21:57 | |
pabelanger | wow, impressive | 21:57 |
Shrews | apparently stored using a 12x12 matrix | 21:58 |
clarkb | jeblair: now you want me to see how many xmonad will do | 21:59 |
clarkb | er | 21:59 |
clarkb | you have made me want to see | 21:59 |
jeblair | clarkb: yes. also, i double dog dare you. | 21:59 |
pabelanger | https://review.openstack.org/#/q/status:open+project:openstack-infra/nodepool+branch:feature/zuulv3+topic:v3-tests is 2 tests that are ready for review / merge. Minimal changes | 22:04 |
pabelanger | 5 now | 22:05 |
pabelanger | topic update | 22:05 |
harlowja | jeblair https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java#L671-L675 | 22:06 |
harlowja | https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java#L672-L675 | 22:06 |
openstackgerrit | Adam Gandelman proposed openstack-infra/zuul: Re-enable test_success_pattern as test_success_url https://review.openstack.org/400455 | 22:14 |
jeblair | harlowja, Shrews, pabelanger: so it looks like it is the version number of the parent (for us, a node which does nothing other than contain the builds/uploads). so whether it starts at 0, 1, or something else is likely related to how exactly we end up creating that node (and whether we do anything else with it in the future). in other words, we're more likely to make that unstable than zookeeper. having said that, i'm now leaning toward ... | 22:18 |
jeblair | ... agreeing with Shrews that we should try to avoid it. | 22:18 |
harlowja | or make new parent nodes for testing | 22:18 |
*** willthames has joined #zuul | 22:19 | |
harlowja | and then u'll probably get 000001 | 22:19 |
harlowja | otherwise prob not, lol | 22:19 |
harlowja | prob more/less test what u actually care about (that its incrementing?) | 22:19 |
openstackgerrit | Adam Gandelman proposed openstack-infra/zuul: Re-enable test_success_pattern as test_success_url https://review.openstack.org/400455 | 22:22 |
openstackgerrit | Adam Gandelman proposed openstack-infra/zuul: Re-enable test_success_pattern as test_success_url https://review.openstack.org/400455 | 22:23 |
adam_g | jeez. third times a charm | 22:23 |
pabelanger | jeblair: harlowja: Shrews: okay, let me think about another way to waitForBuild without using build_id. Maybe we need to stop using builder.DibImageFile.from_image_id() | 22:33 |
jeblair | pabelanger: i'd say let's do that, but not let it block us for now. | 22:35 |
pabelanger | k | 22:36 |
*** yolanda has quit IRC | 22:58 | |
Shrews | pabelanger: waitForImage implies the build. just have it return the image if you need the build id | 23:02 |
*** jamielennox is now known as jamielennox|away | 23:24 | |
*** jamielennox|away is now known as jamielennox | 23:24 | |
jamielennox | hey, is there anything about the test_requirements tests that means they've been left so late? | 23:42 |
jamielennox | anything major changing there re v3, or not suitable for beginners? | 23:42 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!