Tuesday, 2016-12-06

*** Cibo_ has joined #zuul		00:50
Shrews	pabelanger: i have a suspicion about that random cleanup failure. waiting for it to reproduce to confirm, but i think what's happening is that sometimes the upload state_time for 001 and 002 is the same, so 001 could sort before 002 in the upload recency table, so we see it as one of the 2 most recent uploads (not 002 and 003)	03:47
Shrews	(state_time is a rounded number because we int() it). getting things to happen quick enough, and at just the right time, is why it would be so hard to recreate	03:49
Shrews	jeblair: maybe we shouldn't int() the state_time?	03:51
jamielennox	hey, just before i sink any more time into the tests is zuultrigger expected to work in the same way ?	04:11
*** mordred has quit IRC		04:26
*** greghaynes has quit IRC		04:26
*** Shrews has quit IRC		04:27
*** phschwartz has quit IRC		04:27
*** phschwartz has joined #zuul		05:00
*** Shrews has joined #zuul		05:03
*** mordred has joined #zuul		05:04
*** greghaynes has joined #zuul		05:04
*** anteaya has quit IRC		05:16
*** mordred has quit IRC		05:18
*** Shrews has quit IRC		05:18
*** Shrews has joined #zuul		05:26
*** mordred has joined #zuul		05:27
*** anteaya has joined #zuul		05:29
*** saneax-_-\|AFK is now known as saneax		07:11
*** bstinson has quit IRC		07:23
*** mordred has quit IRC		07:25
*** bstinson has joined #zuul		07:28
*** phschwartz has quit IRC		07:29
*** Shrews has quit IRC		07:30
*** mordred has joined #zuul		07:34
*** phschwartz has joined #zuul		07:34
*** Shrews has joined #zuul		07:35
*** abregman has joined #zuul		08:17
*** willthames has quit IRC		09:57
*** hashar has joined #zuul		10:06
*** Zara_ is now known as Zara		10:10
*** jamielennox is now known as jamielennox\|away		11:44
*** abregman has quit IRC		12:04
*** Cibo_ has quit IRC		12:16
*** abregman has joined #zuul		13:02
Shrews	pabelanger: so, yep, looks like state_time issue. we could add a .5s sleep between 001 and 002 creation in the test to get rid of that	14:24
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Fix race in test_image_rotation https://review.openstack.org/407539	14:36
*** Cibo_ has joined #zuul		14:37
Shrews	pabelanger: ^^^ had to use a full second b/c rounding up or down isn't performed	14:38
*** saneax is now known as saneax-_-\|AFK		14:44
openstackgerrit	Merged openstack-infra/nodepool: Delete hard upload failures from current builds https://review.openstack.org/406342	15:04
Shrews	oh, w00t for that merging	15:05
pabelanger	clarkb: Ya looking at it now	15:06
clarkb	pabelanger: sorry I wasn't able to track it down further	15:07
Shrews	pabelanger: i'm going to fix the merge failures of our test fixes	15:07
pabelanger	Shrews: ack	15:08
pabelanger	clarkb: I waa thinking about it more last night, maybe we should just touch the md5 / sha256 files, since we are mocking uploads to shade	15:08
pabelanger	clarkb: going to try that this morning and see if there is a difference	15:08
pabelanger	or use static md5 / sha256 contents	15:08
clarkb	pabelanger: well that will fail the if md5 check	15:08
clarkb	pabelanger: but you could put arbitrary content in them for sure	15:09
pabelanger	Ya, that	15:09
clarkb	ya maybe give that a try	15:09
openstackgerrit	Merged openstack-infra/zuul: Update storyboard links in README https://review.openstack.org/407212	15:09
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Fix race in test_image_rotation https://review.openstack.org/407539	15:11
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Fix race condition in test_image_rotation_invalid_external_name https://review.openstack.org/407139	15:11
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create https://review.openstack.org/406411	15:13
*** saneax-_-\|AFK is now known as saneax		15:30
pabelanger	okay, running latest version of nodepool-builder	15:31
pabelanger	just seen	15:31
pabelanger	2016-12-06 15:28:52,190 INFO nodepool.builder.CleanupWorker.0: Removing failed upload record: <ImageUpload {'build_id': u'0000000009', 'stat': ZnodeStat(czxid=955502, mzxid=955502, ctime=1481033836083, mtime=1481033836083, version=0, cversion=0, aversion=0, ephemeralOwner=0, dataLength=92, numChildren=0, pzxid=955502), 'external_name': None, 'state_time': 1481033836, 'image_name': u'centos-7', 'state':	15:31
pabelanger	u'uploading', 'provider_name': 'rax-ord', 'external_id': None, 'id': u'0000000002'}>	15:31
pabelanger	looks like the clean up worked	15:32
mordred	\o/	15:34
clarkb	pabelanger: looks like coverage passed on 406411 but base py27 did not	15:35
pabelanger	clarkb: ya, that is the race in https://review.openstack.org/#/c/407139/	15:35
clarkb	gotcha	15:35
pabelanger	we should land that stack this morning, Shrews added another patch too	15:36
*** abregman has quit IRC		15:36
*** abregman has joined #zuul		15:37
Shrews	pabelanger: fyi, took 487 runs overnight to reproduce that with the additional debug info i added :(	15:41
pabelanger	Shrews: Wow, where is the log for that?	15:42
Shrews	pabelanger: i have it locally	15:42
pabelanger	okay	15:42
Shrews	lemme see if i can find it again and paste for you	15:42
Shrews	pabelanger: http://paste.openstack.org/show/591555/	15:45
Shrews	line 504 was key	15:45
Shrews	but you can see the recency table included 001 and not 002	15:46
pabelanger	Ya, see that	15:47
*** saneax is now known as saneax-_-\|AFK		15:47
Shrews	i'm going to try keeping the state_time precision and see what breaks. using int() on it is a carry over from the old code and i'm not convinced that's necessary	15:49
pabelanger	ack	15:50
*** Cibo_ has quit IRC		16:17
mordred	Shrews: I'd love it if keeping precision would let us not have a sleep(1) in the tetss	16:40
Shrews	mordred: me too. testing it now	16:42
jeblair	i think there was actually a sleep(1) or maybe even a sleep(2) in the original nodepool snapshot code to make sure that each snapshot had a unique timestamp. so this is an improvement. :) but yeah, i think adding the sleep in the test is okay, but if we don't need it, even better.	16:42
pabelanger	So, doing some ops things on nb01.o.o, I just deleted the latest ubuntu-xenial image in infracloud-vanilla. nodepool deleted the image, but then proceeded to upload that image again. Obviously this is a change to nodepool gearman but open an interesting issue, I don't think we have a way to roll back to the previous uploaded image	16:45
jeblair	pabelanger: pause	16:45
jeblair	pabelanger: pause then delete	16:45
pabelanger	yes	16:45
pabelanger	okay, so lets try that	16:46
pabelanger	Hmm, actually, don't think we have pause for uploads yet, just builds	16:46
pabelanger	let me double check	16:46
jeblair	(we may also want a pause CLI)	16:46
Shrews	ok, i think not removing precision will "just work" for us, making my sleep(1) patch unnecessary	16:47
pabelanger	ya, CLI pause could be useful too	16:47
jeblair	Shrews: well, i +3d it so that other changes don't hit the race. would you like me to revoke that, or do you just want to revert + change?	16:48
Shrews	jeblair: i'll revert it in my next review	16:48
jeblair	cool	16:48
mordred	jeblair: would a "rollback" cli call be worthwhile?	16:49
jeblair	mordred: the sequence there would be: "nodepool rollback <something>; fix; nodepool unpause <something>". versus "nodepool pause <something>; nodepool delete <something>; fix; nodepool unpause <something>"	16:52
mordred	nod	16:52
jeblair	mordred: i have a slight preference for the 3 step process on account of it's a bit more clear what's happening and what needs to be un-done when things are fixed	16:52
jeblair	er, i guess that wasn't clear. i meant the second process. :)	16:53
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Do not truncate state_time precision https://review.openstack.org/407606	16:53
jeblair	mordred: i'm pretty sure we will need the individual commands anyway... we can allways add rollback if we think it's useful.	16:53
mordred	jeblair: yah. I follow - I think the main usecase I was thinking of was "there is a content problem in all of the ubuntu-xenial images we want to delete all of them in all providers and wait until tomorrows rebuilds before we upload new images"	16:54
mordred	but even that is likely not the right way to think about that _Really_	16:55
jeblair	mordred: yeah... i think that deleting the image might still work that way...	16:56
jeblair	pabelanger: you deleted the upload from infracloud-vanilla, but not the base image, right?	16:57
jeblair	mordred: i guess i should say that deleting the build might still work that way	16:57
* jeblair increases terminology precision		16:57
jeblair	mordred: oh, no that wouldn't either. not the latest build.	16:58
jeblair	so in both cases, something needs to be paused. since we agressively try to build and upload.	16:58
pabelanger	jeblair: can you rephrase?	16:58
jeblair	pabelanger: your test deleted the upload from the cloud, not the diskimage, right?	16:59
mordred	yah - I think such a rollback would need to for all providers: pause, delete upload, delete build, unpause - I guess the question is "is this a situation where just rebuilding now would fix the issue" or "do you need to pause/delete then take manual fixing steps"	16:59
pabelanger	jeblair: right, we have 2 images uploaded, I did image-delete newest upload	17:00
openstackgerrit	Merged openstack-infra/nodepool: Fix race condition in test_image_rotation_invalid_external_name https://review.openstack.org/407139	17:00
jeblair	mordred: exactly. so if an immediate rebuild will do, just delete the dib image. if work needs to happen first, pause/delete/unpause	17:00
openstackgerrit	Merged openstack-infra/nodepool: Fix race in test_image_rotation https://review.openstack.org/407539	17:00
jeblair	(this is all stuff that we will put in the nodepool docs :)	17:01
mordred	yah	17:01
jeblair	pabelanger, Shrews: can you weigh in on https://review.openstack.org/404452 ?	17:33
pabelanger	Ah, I've been meaning to test that	17:34
jeblair	will probably need to update the checksum patch if we land it first	17:34
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Delete builds when diskimage removed from config https://review.openstack.org/400421	17:36
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Add pause option for image uploads https://review.openstack.org/407620	17:37
jeblair	pabelanger: i just noticed the existing pause test was in test_commands. can we move those to test_builder (and rework them to avoid using the cli?)	17:39
*** abregman has quit IRC		17:40
pabelanger	jeblair: sure, I think we should keep test_dib_image_build_pause there, since we have some pause logic in nodepoolcmd.py	17:42
pabelanger	but other could move	17:42
jeblair	pabelanger: agreed. the next 2 can move i think.	17:43
jeblair	pabelanger: i +2d that change if you want to do the move of both of those tests as a followup	17:43
pabelanger	yes	17:43
pabelanger	I'll reworkthem	17:43
pabelanger	sure	17:43
pabelanger	wow, our --checksum patch is finding all the things today	17:46
pabelanger	http://logs.openstack.org/11/406411/9/check/nodepool-coverage-ubuntu-xenial/b9dba9a/console.html	17:46
pabelanger	that one is new	17:46
pabelanger	looks like we lost our connection to zookeeper	17:46
pabelanger	Shrews: ^ might be interested in that one	17:47
jeblair	2016-12-06 17:38:11.376907 \| DEBUG [gear.Client.nodepool] Processing input on <gear.Connection 0x7f9cb42b3150 host: localhost port: 33769>	17:49
jeblair	2016-12-06 17:38:40.174067 \| INFO [gear.Client.nodepool] Received admin data <gear.AdminRequest 0x7f9cb429f450 command: status>	17:49
jeblair	pabelanger: ^ that timestamp gap is really suspicious; like the host was unresponsive for 29 seconds	17:49
Shrews	pabelanger: looks like it never even started	17:50
pabelanger	jeblair: So, that is about the 3rd time we've seen a gap of 20-30 seconds. I'm starting to wonder if we are starving the CPU or something	17:50
Shrews	jeblair: will review that other one later. Not at a computer now	17:51
openstackgerrit	Merged openstack-infra/nodepool: Fail on bad options to fake-image-create https://review.openstack.org/404452	17:55
*** hashar is now known as hasharEat		18:02
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create https://review.openstack.org/406411	18:02
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663	18:10
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Fix zookeeper config in test fixtures https://review.openstack.org/407632	18:10
jeblair	pabelanger, mordred: ^ that test fix could potentially cause some test flakiness, though i don't know if we've actually seen it in zuul runs	18:11
jeblair	er, sorry, the fix could potentially fix it, not cause it. :)	18:11
pabelanger	yay for fixes	18:12
pabelanger	pretty happy how much we are finding just from our unit tests	18:12
jeblair	oh, the timestamps in the test aren't real -- they are just when testr outputs everything at the end. i think we will need to update the formatter to get proper timestamps.	18:14
pabelanger	Ah, that explains a lot	18:15
jeblair	i'll push a change for that	18:16
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Include timestamps in test logs https://review.openstack.org/407640	18:21
jeblair	pabelanger, mordred: ^	18:22
SpamapS	jeblair: Shrews could you confirm that my statement is correct in the comments of https://storyboard.openstack.org/#!/story/2000812 ?	18:23
pabelanger	jeblair: clarkb: --checksum patch is green again: https://review.openstack.org/#/c/406411/ if you don't mind adding it back into your review queue	18:31
clarkb	I will take a peak once I get this xenialification change for designate passing and pushed	18:32
pabelanger	WFM	18:32
pabelanger	I'm going to look at launching nb02.o.o	18:32
jeblair	SpamapS: looking	18:37
jeblair	SpamapS: actually i don't think it is addressed. i think 811 addressed the thing that brought it to our attention, but when we tried to manually fix the bug described by 811, we noticed that running a manual image-delete didn't actually delete the image, which is what 812 is going on about. :)	18:41
jeblair	SpamapS: i'll push up a change with a failing test for that	18:41
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Make test_image_delete fail https://review.openstack.org/407649	18:43
openstackgerrit	James E. Blair proposed openstack-infra/zuul: Add roadmap to README https://review.openstack.org/407213	18:56
openstackgerrit	James E. Blair proposed openstack-infra/zuul: Add update storyboard script https://review.openstack.org/407229	18:57
jeblair	pabelanger: regarding 406411 -- i thought we were going to put a remove method on the dibimagefile?	18:59
pabelanger	jeblair: I still can, thats what clarkb and I can up with now. But, if you don't mind providing some guidance, I'll update it again	19:02
jeblair	pabelanger: i left a comment	19:03
clarkb	oh I was just trying to model the object attributes with the class, but making removal a method too sounds good to me	19:03
*** hasharEat is now known as hashar		19:05
pabelanger	woah	19:06
pabelanger	\| ubuntu-xenial-0000000010 \| ubuntu-xenial \| nb02 \| \| building \| 00:00:01:57 \|	19:06
pabelanger	nb02 is our winner today for xenial	19:07
pabelanger	I had just started it too	19:07
*** jamielennox\|away is now known as jamielennox		19:07
Shrews	pabelanger: oh, it's active already. neat.	19:28
pabelanger	ya, it scooped up the build right away	19:29
openstackgerrit	Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675	19:33
openstackgerrit	Merged openstack-infra/nodepool: Include timestamps in test logs https://review.openstack.org/407640	19:33
*** pleia2 has quit IRC		19:34
*** pleia2 has joined #zuul		19:34
*** toabctl has quit IRC		19:35
*** toabctl has joined #zuul		19:36
SpamapS	jeblair: ah! ok, I'll mark 812 as in progress then	19:36
SpamapS	oh n/m you did that :)	19:37
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create https://review.openstack.org/406411	19:46
pabelanger	jeblair: okay, I think I captured your request^. Not sure passing log is the right approach I took, but will update based on comments	19:47
pabelanger	Also seeing some cross clean up across nb01 and nb02, each helping to remove images after we've uploaded a new one	19:48
openstackgerrit	Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675	19:49
openstackgerrit	Merged openstack-infra/nodepool: Do not truncate state_time precision https://review.openstack.org/407606	19:56
pabelanger	going to be calling it early today, need to dadops at home today. Everybody is sick	20:00
pabelanger	I'll pick up the rest of my patches in the morning, unless somebody else beats me to it	20:00
clarkb	ugh everyone is sick is my new ugh	20:01
clarkb	we are all recovering from something	20:01
pabelanger	however, nb01.o.o and nb02.o.o are both processing things, if people want to poke at them and look at logs	20:01
openstackgerrit	Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675	20:04
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Fix image-delete command https://review.openstack.org/407649	20:05
Shrews	jeblair: ^^^ usurped your failure demonstration review	20:06
Shrews	pabelanger: how would i poke at them and their logs?	20:06
Shrews	or is that for folks with special powers?	20:06
Shrews	pabelanger: also, the state_time fix merged	20:07
clarkb	we should be hosting the image build logs and I thought there were changes to host the service logs that ianw wrote too	20:10
clarkb	Shrews: look in and around http://nb01.openstack.org	20:10
ianw	clarkb: we don't expose the upload logs ... we could, but i just wasn't 100% certain they were sane to expose	20:11
Shrews	clarkb: all i see are dib build logs. i'm not interested in those	20:11
clarkb	ianw: gotcha	20:11
ianw	Shrews: yeah, we only output build logs. the other logs we don't just to avoid leaking info in them	20:13
clarkb	I know that ksa is supposed to br safe now iy scrubs passowrds	20:15
ianw	yeah, the way we deploy though, with the "bring in the latest of everything automatically" model, means what's true now might not be in 5 minutes :) so that would be my concern	20:17
mordred	clarkb: fwiw, I am pretty satisfied with the level of scrubbing done by ksa	20:20
morgan	if we missed some scrubbing (not in debug mode), let us know	20:38
morgan	we work very hard to keep secure data scrubbed in ksa	20:38
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Fix image-delete command https://review.openstack.org/407649	20:38
morgan	even in debug mode we do a reasonable job	20:38
morgan	just less effort because debug is meant to expose actual data	20:39
clarkb	ah ok so maybe only expose info and above	20:40
clarkb	and call that a reasonable compromise between conservative and useful	20:40
mordred	clarkb: yah - and honestly, we cannot handle ksa debug logging in our logs	20:41
mordred	it's SUPER HUGE	20:41
mordred	(that's where we log every REST interaction and their payloads)	20:41
harlowja	mordred hey	21:06
harlowja	can u get the zookeeper package into epel	21:06
harlowja	talk to some folks there...	21:06
harlowja	https://review.openstack.org/#/c/406878/ (kolla folks trying to use it)	21:06
harlowja	but it doesn't exist (still...)	21:06
harlowja	use your SUPER powers to make that happen?	21:07
mordred	uh	21:08
mordred	harlowja: what possible powers do you think I have WRT epel?	21:08
harlowja	all of them	21:08
harlowja	u are TREMENDOUS	21:08
harlowja	u are the one	21:09
harlowja	lol	21:09
mordred	harlowja: :)	21:11
jeblair	Shrews: cool, thanks!	21:12
mordred	rbergeron: ^^ you know fancy special people I think ... this is a thing that is about to be more important with zuul - anything sane we can do to help that?	21:12
jeblair	mordred exercises his superpower	21:12
Shrews	jeblair: may i ask you a question re: 400421?	21:13
jeblair	Shrews: but of course	21:13
Shrews	jeblair: i'm not 100% certain why you removed the checks against config.images_in_use in builder.py	21:13
harlowja	mordred thxs the one	21:14
Shrews	jeblair: i mean, they still seem like something we want to do in order to not build images that are in the config, but not in use	21:15
jeblair	Shrews: well, i believe i was thinking at the time that we would want to build them even if they are not in use. but i neglected to recall that we would not know what format(s) to build. so perhaps i should add those back?	21:16
Shrews	jeblair: i think the images_in_use is based only on the 'labels' section (something i did not have from my test .yaml earlier today and i hit this code)	21:17
Shrews	so maybe we know the formats?	21:17
jeblair	Shrews: the formats come from the provider:images: section	21:17
Shrews	right, so if we have provider:images, and diskimages:, but no labels:, they would not be "in use" and we COULD still build, but do we want to?	21:18
jeblair	basically, 'diskimages:' causes an image to be built, 'providers:images:' causes an image to get uploaded, and 'labels:' cause nodes to be created	21:18
jeblair	my idea was to make things ^ that simple	21:19
jeblair	however, the format thing makes it complicated	21:19
jeblair	so it really needs to be:	21:19
Shrews	jeblair: ah, that's been a source of confusion for me (the relationship b/w those things)	21:19
jeblair	'diskimages:' causes an image to be built (as long as there's at least one 'providers:images:' section to tell us the format), 'providers:images:' causes an image to get uploaded, and 'labels:' cause nodes to be created	21:19
jeblair	Shrews: fortunately, we accidentally simplified it (there was a more convoluted relationship between labels and provider-images)	21:20
Shrews	jeblair: cool. so removing those checks doesn't guarantee we have provider:images: (i don't believe), so maybe we should keep them	21:21
Shrews	but that would be a really odd config	21:21
jeblair	Shrews: true, though what i really did there was to remove the "in use by labels" check and add in a new "in use by provider-images" check. so maybe i should replace the old checks with the new check.	21:21
jeblair	to clarify...	21:22
Shrews	jeblair: that would make sense	21:22
Shrews	to my feable mind, at least	21:22
Shrews	jeblair: thanks for the clarification	21:23
jeblair	Shrews: in config.py i replaced images_in_use with diskimage.in_use -- so i'm thinking that the old "don't build images that aren't in use by labels" checks could reasonably be replaced by "don't built images that don't show up in provider-images"	21:23
Shrews	yup	21:23
Shrews	that's much more logical than the original, tbh	21:24
jeblair	Shrews: i could actually implement that as "if diskimage.formats" which might express the issue a little more clearly	21:25
Shrews	that's fine too	21:26
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Delete builds when diskimage removed from config https://review.openstack.org/400421	21:27
jeblair	Shrews: okay, that only affects checkimageforscheduledimageupdates; checkproviderimageupload simply won't even be called in that case, and cleanupimage already has a corresponding check	21:28
*** willthames has joined #zuul		21:29
Shrews	jeblair: +1	21:30
jeblair	pabelanger: i'm sorry. i see the problems you're talking about with PS11 of 406411. given those, i think i now prefer ps10. i think what i'm envisioning would mean substantial changes to dibimagefile and how it's used, so it might be best to just hold off.	21:51
jeblair	Shrews: i'm inclined to go with ps10 of 406411 over ps11 ^ can you take a look and let me know if you have a preference?	21:52
Shrews	jeblair: yep. will look a bit later	21:55
rbergeron	umm	22:08
rbergeron	mordred: will have to read up, gimme a bit	22:09
rbergeron	shrews: hai i'm in your city	22:09
openstackgerrit	Jesse Keating proposed openstack-infra/zuul: Add support for github enterprise https://review.openstack.org/407675	22:09
Shrews	rbergeron: ohai!	22:10
Shrews	rbergeron: will likely see you at thursday night dinner	22:11
rbergeron	yay!	22:12
rbergeron	mordred: oh god, zk isnt in epel?	22:12
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Fix zookeeper config in test fixture https://review.openstack.org/407632	22:28
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663	22:28
openstackgerrit	Merged openstack-infra/nodepool: Add pause option for image uploads https://review.openstack.org/407620	22:32
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Pluralize zk nodes with children https://review.openstack.org/407736	22:32
openstackgerrit	Merged openstack-infra/nodepool: Fix image-delete command https://review.openstack.org/407649	22:34
mordred	rbergeron: so it seems?	22:40
*** harlowja has quit IRC		22:43
*** harlowja has joined #zuul		22:43
*** hashar has quit IRC		22:46
ianw	harlowja: there is a rather large thread on that on internal rhos chat	22:46
harlowja	ianw gets me some popcorn	22:46
ianw	the gist is, zookeeper is not really anyone's priority	22:47
ianw	but we have a few options	22:47
harlowja	i tried to make it a customer request when i was at yahoo	22:47
harlowja	i think that bumped the priority	22:47
harlowja	but guess not high enough :-P	22:47
ianw	i think it's fair to say that in terms of rhos, java dependencies are not wanted	22:47
ianw	and focus there is really on tooz and using i think etcd	22:48
* harlowja wonders how that works for zuulv300		22:48
Shrews	jeblair: what are the ps11 concerns you mentioned?	22:48
harlowja	ianw is the java fear really reasonable anymore?	22:49
ianw	harlowja: tooz? it doesn't, it's zk specific	22:49
ianw	i dunno, but that's the vibe :)	22:49
harlowja	weird	22:49
mordred	ianw: that's gonna be all sorts of fun when zuul v3 is ready if it only really works on Ubuntu	22:50
mordred	ianw: I guess nobody cares too much about mesos/kafka either?	22:50
* mordred mostly trying to sort out in which ways he should go prod which people		22:51
jeblair	Shrews: the logging is weird, for one.	22:51
harlowja	the best way mordred	22:51
harlowja	u should prod the best way	22:51
* Shrews hands mordred his freshly sharpened prodding stick		22:51
*** yolanda has quit IRC		22:52
*** yolanda has joined #zuul		22:54
* mordred just starts stabbing, hoping that he hits someone		22:54
ianw	mordred: i'm going to try making some time with ggilles and we'll try bringing back some of the fedora stuff into his COPR repo and see just how far off it is	22:54
ianw	if it is a complete cluster f- of dependencies that's not promising, but if it mostly works, we might have a path to EPEL	22:55
mordred	sweet! let me know if there's any way I can be useful - although I'm guessing the liklihood of that is fairly low	22:56
Shrews	jeblair: yeah, that is weird. The way we use mostly class methods of that class make its usage odd, IMO. i think i like ps10, too	22:56
Shrews	jeblair: another thought... change from_image_id() to find the extra files, then use that and delete them in deleteLocalBuild()	22:58
jeblair	Shrews: that sounds promising -- do you want to take a stab at that?	22:59
Shrews	jeblair: sure. but tomorrow. :)	22:59
jeblair	k	22:59
*** saneax-_-\|AFK is now known as saneax		23:03
*** yolanda has quit IRC		23:04
*** jamielennox is now known as jamielennox\|away		23:05
*** jamielennox\|away is now known as jamielennox		23:06
SpamapS	ianw: to be fair, ZK is the least badly behaved java program out there in terms of dependencies on bad parts of java.	23:08
SpamapS	it's not maintained by the usual java crowd that vendors the world, only uses oracle java, and burns wings off flies for fun.	23:09
jamielennox	jeblair: so i spent a while playing with the zuultrigger test yesterday - is this expected to work differently in v3 or am i misunderstanding it?	23:09
jeblair	jamielennox: oh, let me take a look	23:10
jamielennox	jeblair: so triggers used to get maintained via scheduler and a scheduler is per source, now triggers get maintained per pipeline	23:10
jamielennox	and so one pipeline getting triggered from another is never getting the messages	23:10
*** yolanda has joined #zuul		23:13
jeblair	jamielennox: i don't think the end behavior is expected to change	23:15
jeblair	jamielennox: there's only one scheduler (that's basically the main loop)	23:15
jeblair	jamielennox: the idea of the zuul trigger is that it has some special methods (eg onChangeMerged) that are called any time interesting events happen, they put synthetic events into the event queue so that any pipeline can end up matching those events in more or less the normal way	23:17
jamielennox	jeblair: ok, so the current failure is basically: https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/manager/__init__.py#L331	23:18
jamielennox	self.sched.triggers is never populated, triggers live at pipeline.triggers	23:19
jeblair	jamielennox: ah, gotcha	23:19
jamielennox	so self.sched is global and so triggers can be cross pipeline	23:19
jamielennox	there is a note from jhesketh here: https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/scheduler.py#L240 that i'm not sure applies any more	23:20
jamielennox	it seems to be relative to loading/unloading which seems to be removed in v3	23:21
jamielennox	the easy solve is to add back all triggers to sched.triggers as well	23:22
*** yolanda has quit IRC		23:22
jamielennox	i tried replacing that block with simply send a global parent merged to the scheduler but the handling gets interesting	23:23
jeblair	jamielennox: yeah, but i think that might be the only remaining use of that, so i'm thinking maybe we should be getting it/them another awy	23:23
jamielennox	jeblair: ++	23:26
jamielennox	it seems odd to have to iterate triggers all triggers to get them started and this is the only place it applies, hence the stop and ask	23:27
jeblair	jamielennox: there are a couple of things that strike me as odd here; let me try to collect thoughts. back in a few.	23:28
jamielennox	jeblair: no worries, i can find other things if you want to mull that one for a while, just thought i'd ask	23:28
*** yolanda has joined #zuul		23:35
jeblair	jamielennox: take a look at https://etherpad.openstack.org/p/MFU7KKGIWH and let me know if that makes sense...	23:42
jeblair	clarkb: can you look at that quickly? i'm especially interested what you think about option 2 there...	23:43
*** yolanda has quit IRC		23:43
* clarkb catches up		23:43
jeblair	jhesketh: ^ also	23:44
clarkb	I think the long term plan was to rely on gerrit checking for merge conflicts. However, do other tools like github support it? if not we may want to keep it as an option for zuul	23:45
jhesketh	jeblair: looking	23:45
jeblair	clarkb: yeah, i have a slight worry that as soon as we remove it, we'll add it back, either for this or something else :\|	23:46
clarkb	that said maybe we can just keep the centralized triggerers and have them farm out to the pipelines?	23:46
mordred	I believe github does support reporting mergability	23:46
mordred	https://github.com/ansible/ansible/pull/18785 shows "This branch has no conflicts with the base branch"	23:47
clarkb	cool	23:47
mordred	"mergeable": true,	23:48
mordred	from https://developer.github.com/v3/pulls/#get-a-single-pull-request (near the bottom of the very large json blob)	23:48
clarkb	jeblair: like maybe have both things where there are singletons that handle the critical sections that need "locking" but otherwise abstract out into trigger per pipeline?	23:48
clarkb	I dunno just talking out loud	23:48
clarkb	er thinking out loud	23:48
jeblair	i think the parent-change-enqueued thing was to pull in cross-repo "depends-on" with unshared queues	23:49
jeblair	it's possible that we still want that, and the only reason we aren't using it is that $someone forgot to write a project-config change to turn it on	23:50
jeblair	yeah, the commit message suggests that was the reason for adding parent-change-enqueued	23:51
jeblair	so i'm leaning toward option 1, independent of whether we continue to use project-change-merged	23:52
jhesketh	jeblair: I'd lean towards option 1 as even if we find a way not to use zuul-triggers for now, they are a useful feature we may find outselves wanting in the future	23:54
jeblair	jhesketh: yeah, that would not surprise me	23:54
jhesketh	also if not us, others	23:54
jamielennox	jeblair: so my opinion is that it's strange to have triggers handling the onChanged event anyway	23:56
jamielennox	this would seem to be a core part of zuul and something that doesn't need to live out in triggers	23:56
jamielennox	zuul core can emit the event every time and then triggers are the things that listen and handle it	23:56
jamielennox	which i think is option 1	23:57
jeblair	jamielennox: i agree. so i think that's how we should implement #1 -- except that i do think that the 'core' should check with all the triggers to see if the event will be used since it's expensive to create these events.	23:57
clarkb	jeblair: basically only emit the event if something is subscribed to it	23:58
clarkb	(in pub sub terms)	23:58
jeblair	clarkb: yep	23:58
jeblair	(project-change-merged in particular is crazy -- it's "enqueue an event for every open change for this project". it can take a very long time to run on a cold cache)	23:58
jamielennox	why is the expense in event creation vs just add the event and let the triggers decided if they want to handle it	23:58
jamielennox	i mean it's less parent-change at that point and just emit a change-merged event and have the trigger determine if it has anything that needs to be re-enqueued based on that change going in	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!