*** Cibo_ has joined #zuul | 00:50 | |
Shrews | pabelanger: i have a suspicion about that random cleanup failure. waiting for it to reproduce to confirm, but i think what's happening is that sometimes the upload state_time for 001 and 002 is the same, so 001 could sort before 002 in the upload recency table, so we see it as one of the 2 most recent uploads (not 002 and 003) | 03:47 |
---|---|---|
Shrews | (state_time is a rounded number because we int() it). getting things to happen quick enough, and at just the right time, is why it would be so hard to recreate | 03:49 |
Shrews | jeblair: maybe we shouldn't int() the state_time? | 03:51 |
jamielennox | hey, just before i sink any more time into the tests is zuultrigger expected to work in the same way ? | 04:11 |
*** mordred has quit IRC | 04:26 | |
*** greghaynes has quit IRC | 04:26 | |
*** Shrews has quit IRC | 04:27 | |
*** phschwartz has quit IRC | 04:27 | |
*** phschwartz has joined #zuul | 05:00 | |
*** Shrews has joined #zuul | 05:03 | |
*** mordred has joined #zuul | 05:04 | |
*** greghaynes has joined #zuul | 05:04 | |
*** anteaya has quit IRC | 05:16 | |
*** mordred has quit IRC | 05:18 | |
*** Shrews has quit IRC | 05:18 | |
*** Shrews has joined #zuul | 05:26 | |
*** mordred has joined #zuul | 05:27 | |
*** anteaya has joined #zuul | 05:29 | |
*** saneax-_-|AFK is now known as saneax | 07:11 | |
*** bstinson has quit IRC | 07:23 | |
*** mordred has quit IRC | 07:25 | |
*** bstinson has joined #zuul | 07:28 | |
*** phschwartz has quit IRC | 07:29 | |
*** Shrews has quit IRC | 07:30 | |
*** mordred has joined #zuul | 07:34 | |
*** phschwartz has joined #zuul | 07:34 | |
*** Shrews has joined #zuul | 07:35 | |
*** abregman has joined #zuul | 08:17 | |
*** willthames has quit IRC | 09:57 | |
*** hashar has joined #zuul | 10:06 | |
*** Zara_ is now known as Zara | 10:10 | |
*** jamielennox is now known as jamielennox|away | 11:44 | |
*** abregman has quit IRC | 12:04 | |
*** Cibo_ has quit IRC | 12:16 | |
*** abregman has joined #zuul | 13:02 | |
Shrews | pabelanger: so, yep, looks like state_time issue. we could add a .5s sleep between 001 and 002 creation in the test to get rid of that | 14:24 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Fix race in test_image_rotation https://review.openstack.org/407539 | 14:36 |
*** Cibo_ has joined #zuul | 14:37 | |
Shrews | pabelanger: ^^^ had to use a full second b/c rounding up or down isn't performed | 14:38 |
*** saneax is now known as saneax-_-|AFK | 14:44 | |
openstackgerrit | Merged openstack-infra/nodepool: Delete hard upload failures from current builds https://review.openstack.org/406342 | 15:04 |
Shrews | oh, w00t for that merging | 15:05 |
pabelanger | clarkb: Ya looking at it now | 15:06 |
clarkb | pabelanger: sorry I wasn't able to track it down further | 15:07 |
Shrews | pabelanger: i'm going to fix the merge failures of our test fixes | 15:07 |
pabelanger | Shrews: ack | 15:08 |
pabelanger | clarkb: I waa thinking about it more last night, maybe we should just touch the md5 / sha256 files, since we are mocking uploads to shade | 15:08 |
pabelanger | clarkb: going to try that this morning and see if there is a difference | 15:08 |
pabelanger | or use static md5 / sha256 contents | 15:08 |
clarkb | pabelanger: well that will fail the if md5 check | 15:08 |
clarkb | pabelanger: but you could put arbitrary content in them for sure | 15:09 |
pabelanger | Ya, that | 15:09 |
clarkb | ya maybe give that a try | 15:09 |
openstackgerrit | Merged openstack-infra/zuul: Update storyboard links in README https://review.openstack.org/407212 | 15:09 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Fix race in test_image_rotation https://review.openstack.org/407539 | 15:11 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Fix race condition in test_image_rotation_invalid_external_name https://review.openstack.org/407139 | 15:11 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create https://review.openstack.org/406411 | 15:13 |
*** saneax-_-|AFK is now known as saneax | 15:30 | |
pabelanger | okay, running latest version of nodepool-builder | 15:31 |
pabelanger | just seen | 15:31 |
pabelanger | 2016-12-06 15:28:52,190 INFO nodepool.builder.CleanupWorker.0: Removing failed upload record: <ImageUpload {'build_id': u'0000000009', 'stat': ZnodeStat(czxid=955502, mzxid=955502, ctime=1481033836083, mtime=1481033836083, version=0, cversion=0, aversion=0, ephemeralOwner=0, dataLength=92, numChildren=0, pzxid=955502), 'external_name': None, 'state_time': 1481033836, 'image_name': u'centos-7', 'state': | 15:31 |
pabelanger | u'uploading', 'provider_name': 'rax-ord', 'external_id': None, 'id': u'0000000002'}> | 15:31 |
pabelanger | looks like the clean up worked | 15:32 |
mordred | \o/ | 15:34 |
clarkb | pabelanger: looks like coverage passed on 406411 but base py27 did not | 15:35 |
pabelanger | clarkb: ya, that is the race in https://review.openstack.org/#/c/407139/ | 15:35 |
clarkb | gotcha | 15:35 |
pabelanger | we should land that stack this morning, Shrews added another patch too | 15:36 |
*** abregman has quit IRC | 15:36 | |
*** abregman has joined #zuul | 15:37 | |
Shrews | pabelanger: fyi, took 487 runs overnight to reproduce that with the additional debug info i added :( | 15:41 |
pabelanger | Shrews: Wow, where is the log for that? | 15:42 |
Shrews | pabelanger: i have it locally | 15:42 |
pabelanger | okay | 15:42 |
Shrews | lemme see if i can find it again and paste for you | 15:42 |
Shrews | pabelanger: http://paste.openstack.org/show/591555/ | 15:45 |
Shrews | line 504 was key | 15:45 |
Shrews | but you can see the recency table included 001 and not 002 | 15:46 |
pabelanger | Ya, see that | 15:47 |
*** saneax is now known as saneax-_-|AFK | 15:47 | |
Shrews | i'm going to try keeping the state_time precision and see what breaks. using int() on it is a carry over from the old code and i'm not convinced that's necessary | 15:49 |
pabelanger | ack | 15:50 |
*** Cibo_ has quit IRC | 16:17 | |
mordred | Shrews: I'd love it if keeping precision would let us not have a sleep(1) in the tetss | 16:40 |
Shrews | mordred: me too. testing it now | 16:42 |
jeblair | i think there was actually a sleep(1) or maybe even a sleep(2) in the original nodepool snapshot code to make sure that each snapshot had a unique timestamp. so this is an improvement. :) but yeah, i think adding the sleep in the test is okay, but if we don't need it, even better. | 16:42 |
pabelanger | So, doing some ops things on nb01.o.o, I just deleted the latest ubuntu-xenial image in infracloud-vanilla. nodepool deleted the image, but then proceeded to upload that image again. Obviously this is a change to nodepool gearman but open an interesting issue, I don't think we have a way to roll back to the previous uploaded image | 16:45 |
jeblair | pabelanger: pause | 16:45 |
jeblair | pabelanger: pause then delete | 16:45 |
pabelanger | yes | 16:45 |
pabelanger | okay, so lets try that | 16:46 |
pabelanger | Hmm, actually, don't think we have pause for uploads yet, just builds | 16:46 |
pabelanger | let me double check | 16:46 |
jeblair | (we may also want a pause CLI) | 16:46 |
Shrews | ok, i think not removing precision will "just work" for us, making my sleep(1) patch unnecessary | 16:47 |
pabelanger | ya, CLI pause could be useful too | 16:47 |
jeblair | Shrews: well, i +3d it so that other changes don't hit the race. would you like me to revoke that, or do you just want to revert + change? | 16:48 |
Shrews | jeblair: i'll revert it in my next review | 16:48 |
jeblair | cool | 16:48 |
mordred | jeblair: would a "rollback" cli call be worthwhile? | 16:49 |
jeblair | mordred: the sequence there would be: "nodepool rollback <something>; fix; nodepool unpause <something>". versus "nodepool pause <something>; nodepool delete <something>; fix; nodepool unpause <something>" | 16:52 |
mordred | nod | 16:52 |
jeblair | mordred: i have a slight preference for the 3 step process on account of it's a bit more clear what's happening and what needs to be un-done when things are fixed | 16:52 |
jeblair | er, i guess that wasn't clear. i meant the second process. :) | 16:53 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Do not truncate state_time precision https://review.openstack.org/407606 | 16:53 |
jeblair | mordred: i'm pretty sure we will need the individual commands anyway... we can allways add rollback if we think it's useful. | 16:53 |
mordred | jeblair: yah. I follow - I think the main usecase I was thinking of was "there is a content problem in all of the ubuntu-xenial images we want to delete all of them in all providers and wait until tomorrows rebuilds before we upload new images" | 16:54 |
mordred | but even that is likely not the right way to think about that _Really_ | 16:55 |
jeblair | mordred: yeah... i think that deleting the *image* might still work that way... | 16:56 |
jeblair | pabelanger: you deleted the *upload* from infracloud-vanilla, but not the base image, right? | 16:57 |
jeblair | mordred: i guess i should say that deleting the *build* might still work that way | 16:57 |
* jeblair increases terminology precision | 16:57 | |
jeblair | mordred: oh, no that wouldn't either. not the latest build. | 16:58 |
jeblair | so in both cases, something needs to be paused. since we agressively try to build and upload. | 16:58 |
pabelanger | jeblair: can you rephrase? | 16:58 |
jeblair | pabelanger: your test deleted the upload from the cloud, not the diskimage, right? | 16:59 |
mordred | yah - I think such a rollback would need to for all providers: pause, delete upload, delete build, unpause - I guess the question is "is this a situation where just rebuilding now would fix the issue" or "do you need to pause/delete then take manual fixing steps" | 16:59 |
pabelanger | jeblair: right, we have 2 images uploaded, I did image-delete newest upload | 17:00 |
openstackgerrit | Merged openstack-infra/nodepool: Fix race condition in test_image_rotation_invalid_external_name https://review.openstack.org/407139 | 17:00 |
jeblair | mordred: exactly. so if an immediate rebuild will do, just delete the dib image. if work needs to happen first, pause/delete/unpause | 17:00 |
openstackgerrit | Merged openstack-infra/nodepool: Fix race in test_image_rotation https://review.openstack.org/407539 | 17:00 |
jeblair | (this is all stuff that we will put in the nodepool docs :) | 17:01 |
mordred | yah | 17:01 |
jeblair | pabelanger, Shrews: can you weigh in on https://review.openstack.org/404452 ? | 17:33 |
pabelanger | Ah, I've been meaning to test that | 17:34 |
jeblair | will probably need to update the checksum patch if we land it first | 17:34 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Delete builds when diskimage removed from config https://review.openstack.org/400421 | 17:36 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add pause option for image uploads https://review.openstack.org/407620 | 17:37 |
jeblair | pabelanger: i just noticed the existing pause test was in test_commands. can we move those to test_builder (and rework them to avoid using the cli?) | 17:39 |
*** abregman has quit IRC | 17:40 | |
pabelanger | jeblair: sure, I think we should keep test_dib_image_build_pause there, since we have some pause logic in nodepoolcmd.py | 17:42 |
pabelanger | but other could move | 17:42 |
jeblair | pabelanger: agreed. the next 2 can move i think. | 17:43 |
jeblair | pabelanger: i +2d that change if you want to do the move of both of those tests as a followup | 17:43 |
pabelanger | yes | 17:43 |
pabelanger | I'll reworkthem | 17:43 |
pabelanger | sure | 17:43 |
pabelanger | wow, our --checksum patch is finding all the things today | 17:46 |
pabelanger | http://logs.openstack.org/11/406411/9/check/nodepool-coverage-ubuntu-xenial/b9dba9a/console.html | 17:46 |
pabelanger | that one is new | 17:46 |
pabelanger | looks like we lost our connection to zookeeper | 17:46 |
pabelanger | Shrews: ^ might be interested in that one | 17:47 |
jeblair | 2016-12-06 17:38:11.376907 | DEBUG [gear.Client.nodepool] Processing input on <gear.Connection 0x7f9cb42b3150 host: localhost port: 33769> | 17:49 |
jeblair | 2016-12-06 17:38:40.174067 | INFO [gear.Client.nodepool] Received admin data <gear.AdminRequest 0x7f9cb429f450 command: status> | 17:49 |
jeblair | pabelanger: ^ that timestamp gap is really suspicious; like the host was unresponsive for 29 seconds | 17:49 |
Shrews | pabelanger: looks like it never even started | 17:50 |
pabelanger | jeblair: So, that is about the 3rd time we've seen a gap of 20-30 seconds. I'm starting to wonder if we are starving the CPU or something | 17:50 |
Shrews | jeblair: will review that other one later. Not at a computer now | 17:51 |
openstackgerrit | Merged openstack-infra/nodepool: Fail on bad options to fake-image-create https://review.openstack.org/404452 | 17:55 |
*** hashar is now known as hasharEat | 18:02 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create https://review.openstack.org/406411 | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663 | 18:10 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Fix zookeeper config in test fixtures https://review.openstack.org/407632 | 18:10 |
jeblair | pabelanger, mordred: ^ that test fix could potentially cause some test flakiness, though i don't know if we've actually seen it in zuul runs | 18:11 |
jeblair | er, sorry, the fix could potentially fix it, not cause it. :) | 18:11 |
pabelanger | yay for fixes | 18:12 |
pabelanger | pretty happy how much we are finding just from our unit tests | 18:12 |
jeblair | oh, the timestamps in the test aren't real -- they are just when testr outputs everything at the end. i think we will need to update the formatter to get proper timestamps. | 18:14 |
pabelanger | Ah, that explains a lot | 18:15 |
jeblair | i'll push a change for that | 18:16 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Include timestamps in test logs https://review.openstack.org/407640 | 18:21 |
jeblair | pabelanger, mordred: ^ | 18:22 |
SpamapS | jeblair: Shrews could you confirm that my statement is correct in the comments of https://storyboard.openstack.org/#!/story/2000812 ? | 18:23 |
pabelanger | jeblair: clarkb: --checksum patch is green again: https://review.openstack.org/#/c/406411/ if you don't mind adding it back into your review queue | 18:31 |
clarkb | I will take a peak once I get this xenialification change for designate passing and pushed | 18:32 |
pabelanger | WFM | 18:32 |
pabelanger | I'm going to look at launching nb02.o.o | 18:32 |
jeblair | SpamapS: looking | 18:37 |
jeblair | SpamapS: actually i don't think it is addressed. i think 811 addressed the thing that brought it to our attention, but when we tried to manually fix the bug described by 811, we noticed that running a manual image-delete didn't actually delete the image, which is what 812 is going on about. :) | 18:41 |
jeblair | SpamapS: i'll push up a change with a failing test for that | 18:41 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Make test_image_delete fail https://review.openstack.org/407649 | 18:43 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Add roadmap to README https://review.openstack.org/407213 | 18:56 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Add update storyboard script https://review.openstack.org/407229 | 18:57 |
jeblair | pabelanger: regarding 406411 -- i thought we were going to put a remove method on the dibimagefile? | 18:59 |
pabelanger | jeblair: I still can, thats what clarkb and I can up with now. But, if you don't mind providing some guidance, I'll update it again | 19:02 |
jeblair | pabelanger: i left a comment | 19:03 |
clarkb | oh I was just trying to model the object attributes with the class, but making removal a method too sounds good to me | 19:03 |
*** hasharEat is now known as hashar | 19:05 | |
pabelanger | woah | 19:06 |
pabelanger | | ubuntu-xenial-0000000010 | ubuntu-xenial | nb02 | | building | 00:00:01:57 | | 19:06 |
pabelanger | nb02 is our winner today for xenial | 19:07 |
pabelanger | I had just started it too | 19:07 |
*** jamielennox|away is now known as jamielennox | 19:07 | |
Shrews | pabelanger: oh, it's active already. neat. | 19:28 |
pabelanger | ya, it scooped up the build right away | 19:29 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675 | 19:33 |
openstackgerrit | Merged openstack-infra/nodepool: Include timestamps in test logs https://review.openstack.org/407640 | 19:33 |
*** pleia2 has quit IRC | 19:34 | |
*** pleia2 has joined #zuul | 19:34 | |
*** toabctl has quit IRC | 19:35 | |
*** toabctl has joined #zuul | 19:36 | |
SpamapS | jeblair: ah! ok, I'll mark 812 as in progress then | 19:36 |
SpamapS | oh n/m you did that :) | 19:37 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create https://review.openstack.org/406411 | 19:46 |
pabelanger | jeblair: okay, I think I captured your request^. Not sure passing log is the right approach I took, but will update based on comments | 19:47 |
pabelanger | Also seeing some cross clean up across nb01 and nb02, each helping to remove images after we've uploaded a new one | 19:48 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675 | 19:49 |
openstackgerrit | Merged openstack-infra/nodepool: Do not truncate state_time precision https://review.openstack.org/407606 | 19:56 |
pabelanger | going to be calling it early today, need to dadops at home today. Everybody is sick | 20:00 |
pabelanger | I'll pick up the rest of my patches in the morning, unless somebody else beats me to it | 20:00 |
clarkb | ugh everyone is sick is my new ugh | 20:01 |
clarkb | we are all recovering from something | 20:01 |
pabelanger | however, nb01.o.o and nb02.o.o are both processing things, if people want to poke at them and look at logs | 20:01 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675 | 20:04 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Fix image-delete command https://review.openstack.org/407649 | 20:05 |
Shrews | jeblair: ^^^ usurped your failure demonstration review | 20:06 |
Shrews | pabelanger: how would i poke at them and their logs? | 20:06 |
Shrews | or is that for folks with special powers? | 20:06 |
Shrews | pabelanger: also, the state_time fix merged | 20:07 |
clarkb | we should be hosting the image build logs and I thought there were changes to host the service logs that ianw wrote too | 20:10 |
clarkb | Shrews: look in and around http://nb01.openstack.org | 20:10 |
ianw | clarkb: we don't expose the upload logs ... we *could*, but i just wasn't 100% certain they were sane to expose | 20:11 |
Shrews | clarkb: all i see are dib build logs. i'm not interested in those | 20:11 |
clarkb | ianw: gotcha | 20:11 |
ianw | Shrews: yeah, we only output build logs. the other logs we don't just to avoid leaking info in them | 20:13 |
clarkb | I know that ksa is supposed to br safe now iy scrubs passowrds | 20:15 |
ianw | yeah, the way we deploy though, with the "bring in the latest of everything automatically" model, means what's true now might not be in 5 minutes :) so that would be my concern | 20:17 |
mordred | clarkb: fwiw, I am pretty satisfied with the level of scrubbing done by ksa | 20:20 |
morgan | if we missed some scrubbing (not in debug mode), let us know | 20:38 |
morgan | we work very hard to keep secure data scrubbed in ksa | 20:38 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Fix image-delete command https://review.openstack.org/407649 | 20:38 |
morgan | even in debug mode we do a reasonable job | 20:38 |
morgan | just less effort because debug is meant to expose actual data | 20:39 |
clarkb | ah ok so maybe only expose info and above | 20:40 |
clarkb | and call that a reasonable compromise between conservative and useful | 20:40 |
mordred | clarkb: yah - and honestly, we cannot handle ksa debug logging in our logs | 20:41 |
mordred | it's SUPER HUGE | 20:41 |
mordred | (that's where we log every REST interaction and their payloads) | 20:41 |
harlowja | mordred hey | 21:06 |
harlowja | can u get the zookeeper package into epel | 21:06 |
harlowja | talk to some folks there... | 21:06 |
harlowja | https://review.openstack.org/#/c/406878/ (kolla folks trying to use it) | 21:06 |
harlowja | but it doesn't exist (still...) | 21:06 |
harlowja | use your SUPER powers to make that happen? | 21:07 |
mordred | uh | 21:08 |
mordred | harlowja: what possible powers do you think I have WRT epel? | 21:08 |
harlowja | all of them | 21:08 |
harlowja | u are TREMENDOUS | 21:08 |
harlowja | u are the one | 21:09 |
harlowja | lol | 21:09 |
mordred | harlowja: :) | 21:11 |
jeblair | Shrews: cool, thanks! | 21:12 |
mordred | rbergeron: ^^ you know fancy special people I think ... this is a thing that is about to be more important with zuul - anything sane we can do to help that? | 21:12 |
jeblair | mordred exercises his superpower | 21:12 |
Shrews | jeblair: may i ask you a question re: 400421? | 21:13 |
jeblair | Shrews: but of course | 21:13 |
Shrews | jeblair: i'm not 100% certain why you removed the checks against config.images_in_use in builder.py | 21:13 |
harlowja | mordred thxs the one | 21:14 |
Shrews | jeblair: i mean, they still seem like something we want to do in order to not build images that are in the config, but not in use | 21:15 |
jeblair | Shrews: well, i believe i was thinking at the time that we would want to build them even if they are not in use. but i neglected to recall that we would not know what format(s) to build. so perhaps i should add those back? | 21:16 |
Shrews | jeblair: i think the images_in_use is based only on the 'labels' section (something i did not have from my test .yaml earlier today and i hit this code) | 21:17 |
Shrews | so maybe we know the formats? | 21:17 |
jeblair | Shrews: the formats come from the provider:images: section | 21:17 |
Shrews | right, so if we have provider:images, and diskimages:, but no labels:, they would not be "in use" and we COULD still build, but do we want to? | 21:18 |
jeblair | basically, 'diskimages:' causes an image to be built, 'providers:images:' causes an image to get uploaded, and 'labels:' cause nodes to be created | 21:18 |
jeblair | my idea was to make things ^ that simple | 21:19 |
jeblair | however, the format thing makes it complicated | 21:19 |
jeblair | so it really needs to be: | 21:19 |
Shrews | jeblair: ah, that's been a source of confusion for me (the relationship b/w those things) | 21:19 |
jeblair | 'diskimages:' causes an image to be built (as long as there's at least one 'providers:images:' section to tell us the format), 'providers:images:' causes an image to get uploaded, and 'labels:' cause nodes to be created | 21:19 |
jeblair | Shrews: fortunately, we accidentally simplified it (there was a more convoluted relationship between labels and provider-images) | 21:20 |
Shrews | jeblair: cool. so removing those checks doesn't guarantee we have provider:images: (i don't believe), so maybe we should keep them | 21:21 |
Shrews | but that would be a really odd config | 21:21 |
jeblair | Shrews: true, though what i really did there was to remove the "in use by labels" check and add in a new "in use by provider-images" check. so maybe i should replace the old checks with the new check. | 21:21 |
jeblair | to clarify... | 21:22 |
Shrews | jeblair: that would make sense | 21:22 |
Shrews | to my feable mind, at least | 21:22 |
Shrews | jeblair: thanks for the clarification | 21:23 |
jeblair | Shrews: in config.py i replaced images_in_use with diskimage.in_use -- so i'm thinking that the old "don't build images that aren't in use by labels" checks could reasonably be replaced by "don't built images that don't show up in provider-images" | 21:23 |
Shrews | yup | 21:23 |
Shrews | that's much more logical than the original, tbh | 21:24 |
jeblair | Shrews: i could actually implement that as "if diskimage.formats" which might express the issue a little more clearly | 21:25 |
Shrews | that's fine too | 21:26 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Delete builds when diskimage removed from config https://review.openstack.org/400421 | 21:27 |
jeblair | Shrews: okay, that only affects checkimageforscheduledimageupdates; checkproviderimageupload simply won't even be called in that case, and cleanupimage already has a corresponding check | 21:28 |
*** willthames has joined #zuul | 21:29 | |
Shrews | jeblair: +1 | 21:30 |
jeblair | pabelanger: i'm sorry. i see the problems you're talking about with PS11 of 406411. given those, i think i now prefer ps10. i think what i'm envisioning would mean substantial changes to dibimagefile and how it's used, so it might be best to just hold off. | 21:51 |
jeblair | Shrews: i'm inclined to go with ps10 of 406411 over ps11 ^ can you take a look and let me know if you have a preference? | 21:52 |
Shrews | jeblair: yep. will look a bit later | 21:55 |
rbergeron | umm | 22:08 |
rbergeron | mordred: will have to read up, gimme a bit | 22:09 |
rbergeron | shrews: hai i'm in your city | 22:09 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675 | 22:09 |
Shrews | rbergeron: ohai! | 22:10 |
Shrews | rbergeron: will likely see you at thursday night dinner | 22:11 |
rbergeron | yay! | 22:12 |
rbergeron | mordred: oh god, zk isnt in epel? | 22:12 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Fix zookeeper config in test fixture https://review.openstack.org/407632 | 22:28 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663 | 22:28 |
openstackgerrit | Merged openstack-infra/nodepool: Add pause option for image uploads https://review.openstack.org/407620 | 22:32 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Pluralize zk nodes with children https://review.openstack.org/407736 | 22:32 |
openstackgerrit | Merged openstack-infra/nodepool: Fix image-delete command https://review.openstack.org/407649 | 22:34 |
mordred | rbergeron: so it seems? | 22:40 |
*** harlowja has quit IRC | 22:43 | |
*** harlowja has joined #zuul | 22:43 | |
*** hashar has quit IRC | 22:46 | |
ianw | harlowja: there is a rather large thread on that on internal rhos chat | 22:46 |
harlowja | ianw gets me some popcorn | 22:46 |
ianw | the gist is, zookeeper is not really anyone's priority | 22:47 |
ianw | but we have a few options | 22:47 |
harlowja | i tried to make it a customer request when i was at yahoo | 22:47 |
harlowja | i think that bumped the priority | 22:47 |
harlowja | but guess not high enough :-P | 22:47 |
ianw | i think it's fair to say that in terms of rhos, java dependencies are not wanted | 22:47 |
ianw | and focus there is really on tooz and using i think etcd | 22:48 |
* harlowja wonders how that works for zuulv300 | 22:48 | |
Shrews | jeblair: what are the ps11 concerns you mentioned? | 22:48 |
harlowja | ianw is the java fear really reasonable anymore? | 22:49 |
ianw | harlowja: tooz? it doesn't, it's zk specific | 22:49 |
ianw | i dunno, but that's the vibe :) | 22:49 |
harlowja | weird | 22:49 |
mordred | ianw: that's gonna be all sorts of fun when zuul v3 is ready if it only really works on Ubuntu | 22:50 |
mordred | ianw: I guess nobody cares too much about mesos/kafka either? | 22:50 |
* mordred mostly trying to sort out in which ways he should go prod which people | 22:51 | |
jeblair | Shrews: the logging is weird, for one. | 22:51 |
harlowja | the best way mordred | 22:51 |
harlowja | u should prod the best way | 22:51 |
* Shrews hands mordred his freshly sharpened prodding stick | 22:51 | |
*** yolanda has quit IRC | 22:52 | |
*** yolanda has joined #zuul | 22:54 | |
* mordred just starts stabbing, hoping that he hits someone | 22:54 | |
ianw | mordred: i'm going to try making some time with ggilles and we'll try bringing back some of the fedora stuff into his COPR repo and see just how far off it is | 22:54 |
ianw | if it is a complete cluster f- of dependencies that's not promising, but if it mostly works, we might have a path to EPEL | 22:55 |
mordred | sweet! let me know if there's any way I can be useful - although I'm guessing the liklihood of that is fairly low | 22:56 |
Shrews | jeblair: yeah, that is weird. The way we use mostly class methods of that class make its usage odd, IMO. i think i like ps10, too | 22:56 |
Shrews | jeblair: another thought... change from_image_id() to find the extra files, then use that and delete them in deleteLocalBuild() | 22:58 |
jeblair | Shrews: that sounds promising -- do you want to take a stab at that? | 22:59 |
Shrews | jeblair: sure. but tomorrow. :) | 22:59 |
jeblair | k | 22:59 |
*** saneax-_-|AFK is now known as saneax | 23:03 | |
*** yolanda has quit IRC | 23:04 | |
*** jamielennox is now known as jamielennox|away | 23:05 | |
*** jamielennox|away is now known as jamielennox | 23:06 | |
SpamapS | ianw: to be fair, ZK is the least badly behaved java program out there in terms of dependencies on bad parts of java. | 23:08 |
SpamapS | it's not maintained by the usual java crowd that vendors the world, only uses oracle java, and burns wings off flies for fun. | 23:09 |
jamielennox | jeblair: so i spent a while playing with the zuultrigger test yesterday - is this expected to work differently in v3 or am i misunderstanding it? | 23:09 |
jeblair | jamielennox: oh, let me take a look | 23:10 |
jamielennox | jeblair: so triggers used to get maintained via scheduler and a scheduler is per source, now triggers get maintained per pipeline | 23:10 |
jamielennox | and so one pipeline getting triggered from another is never getting the messages | 23:10 |
*** yolanda has joined #zuul | 23:13 | |
jeblair | jamielennox: i don't think the end behavior is expected to change | 23:15 |
jeblair | jamielennox: there's only one scheduler (that's basically the main loop) | 23:15 |
jeblair | jamielennox: the idea of the zuul trigger is that it has some special methods (eg onChangeMerged) that are called any time interesting events happen, they put synthetic events into the event queue so that any pipeline can end up matching those events in more or less the normal way | 23:17 |
jamielennox | jeblair: ok, so the current failure is basically: https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/manager/__init__.py#L331 | 23:18 |
jamielennox | self.sched.triggers is never populated, triggers live at pipeline.triggers | 23:19 |
jeblair | jamielennox: ah, gotcha | 23:19 |
jamielennox | so self.sched is global and so triggers can be cross pipeline | 23:19 |
jamielennox | there is a note from jhesketh here: https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/scheduler.py#L240 that i'm not sure applies any more | 23:20 |
jamielennox | it seems to be relative to loading/unloading which seems to be removed in v3 | 23:21 |
jamielennox | the easy solve is to add back all triggers to sched.triggers as well | 23:22 |
*** yolanda has quit IRC | 23:22 | |
jamielennox | i tried replacing that block with simply send a global parent merged to the scheduler but the handling gets interesting | 23:23 |
jeblair | jamielennox: yeah, but i think that might be the only remaining use of that, so i'm thinking maybe we should be getting it/them another awy | 23:23 |
jamielennox | jeblair: ++ | 23:26 |
jamielennox | it seems odd to have to iterate triggers all triggers to get them started and this is the only place it applies, hence the stop and ask | 23:27 |
jeblair | jamielennox: there are a couple of things that strike me as odd here; let me try to collect thoughts. back in a few. | 23:28 |
jamielennox | jeblair: no worries, i can find other things if you want to mull that one for a while, just thought i'd ask | 23:28 |
*** yolanda has joined #zuul | 23:35 | |
jeblair | jamielennox: take a look at https://etherpad.openstack.org/p/MFU7KKGIWH and let me know if that makes sense... | 23:42 |
jeblair | clarkb: can you look at that quickly? i'm especially interested what you think about option 2 there... | 23:43 |
*** yolanda has quit IRC | 23:43 | |
* clarkb catches up | 23:43 | |
jeblair | jhesketh: ^ also | 23:44 |
clarkb | I think the long term plan was to rely on gerrit checking for merge conflicts. However, do other tools like github support it? if not we may want to keep it as an option for zuul | 23:45 |
jhesketh | jeblair: looking | 23:45 |
jeblair | clarkb: yeah, i have a slight worry that as soon as we remove it, we'll add it back, either for this or something else :| | 23:46 |
clarkb | that said maybe we can just keep the centralized triggerers and have them farm out to the pipelines? | 23:46 |
mordred | I believe github does support reporting mergability | 23:46 |
mordred | https://github.com/ansible/ansible/pull/18785 shows "This branch has no conflicts with the base branch" | 23:47 |
clarkb | cool | 23:47 |
mordred | "mergeable": true, | 23:48 |
mordred | from https://developer.github.com/v3/pulls/#get-a-single-pull-request (near the bottom of the very large json blob) | 23:48 |
clarkb | jeblair: like maybe have both things where there are singletons that handle the critical sections that need "locking" but otherwise abstract out into trigger per pipeline? | 23:48 |
clarkb | I dunno just talking out loud | 23:48 |
clarkb | er thinking out loud | 23:48 |
jeblair | i think the parent-change-enqueued thing was to pull in cross-repo "depends-on" with unshared queues | 23:49 |
jeblair | it's possible that we still want that, and the only reason we aren't using it is that $someone forgot to write a project-config change to turn it on | 23:50 |
jeblair | yeah, the commit message suggests that was the reason for adding parent-change-enqueued | 23:51 |
jeblair | so i'm leaning toward option 1, independent of whether we continue to use project-change-merged | 23:52 |
jhesketh | jeblair: I'd lean towards option 1 as even if we find a way not to use zuul-triggers for now, they are a useful feature we may find outselves wanting in the future | 23:54 |
jeblair | jhesketh: yeah, that would not surprise me | 23:54 |
jhesketh | also if not us, others | 23:54 |
jamielennox | jeblair: so my opinion is that it's strange to have triggers handling the onChanged event anyway | 23:56 |
jamielennox | this would seem to be a core part of zuul and something that doesn't need to live out in triggers | 23:56 |
jamielennox | zuul core can emit the event every time and then triggers are the things that listen and handle it | 23:56 |
jamielennox | which i think is option 1 | 23:57 |
jeblair | jamielennox: i agree. so i think that's how we should implement #1 -- except that i do think that the 'core' should check with all the triggers to see if the event will be used since it's expensive to create these events. | 23:57 |
clarkb | jeblair: basically only emit the event if something is subscribed to it | 23:58 |
clarkb | (in pub sub terms) | 23:58 |
jeblair | clarkb: yep | 23:58 |
jeblair | (project-change-merged in particular is crazy -- it's "enqueue an event for every open change for this project". it can take a very long time to run on a cold cache) | 23:58 |
jamielennox | why is the expense in event creation vs just add the event and let the triggers decided if they want to handle it | 23:58 |
jamielennox | i mean it's less parent-change at that point and just emit a change-merged event and have the trigger determine if it has anything that needs to be re-enqueued based on that change going in | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!