Friday, 2017-03-10

*** hashar has quit IRC		00:18
openstackgerrit	John L. Villalovos proposed openstack-infra/zuul master: Only depend-on open changes https://review.openstack.org/254957	00:20
*** jamielennox is now known as jamielennox\|away		00:36
*** jamielennox\|away is now known as jamielennox		00:37
openstackgerrit	Merged openstack-infra/zuul master: Only depend-on open changes https://review.openstack.org/254957	00:40
*** saneax is now known as saneax-_-\|AFK		00:55
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Improve job dependencies using graph instead of tree https://review.openstack.org/443973	01:26
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Update test config for job graph syntax https://review.openstack.org/444055	01:26
jeblair	that's an auto-generated update to the test config files.	01:27
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Support for github commit status https://review.openstack.org/444060	01:47
jamielennox	in zuul 2.5 is there a way from the CLI to stop a job and marked it failed	03:24
*** saneax-_-\|AFK is now known as saneax		04:09
*** saneax is now known as saneax-_-\|AFK		05:51
*** saneax-_-\|AFK is now known as saneax		05:59
*** saneax is now known as saneax-_-\|AFK		06:27
*** bhavik1 has joined #zuul		07:16
*** saneax-_-\|AFK is now known as saneax		07:20
*** Cibo has joined #zuul		07:42
*** Cibo_ has quit IRC		07:45
*** Cibo has quit IRC		07:46
*** dmsimard has quit IRC		08:12
*** bhavik1 has quit IRC		08:14
*** dmsimard has joined #zuul		08:35
*** hashar has joined #zuul		10:05
*** Cibo has joined #zuul		10:17
*** Cibo has quit IRC		10:22
*** openstackgerrit has quit IRC		10:33
*** bhavik1 has joined #zuul		10:41
*** bhavik1 has quit IRC		11:02
*** saneax is now known as saneax-_-\|AFK		11:28
*** Cibo has joined #zuul		12:29
*** mptacekx has joined #zuul		12:30
mptacekx	Hi zuul, can someone please suggest good nodepool version for zuul v2.5.x ?	12:31
*** hashar is now known as rahsah		12:32
mptacekx	we tried latest nodepool 0.4.0 but some strange behavior is seen, like time-to-time vm's building / deleting inm never ending loops ...	12:32
*** Cibo has quit IRC		12:34
*** yolanda has quit IRC		12:46
*** yolanda has joined #zuul		13:06
pabelanger	jamielennox: no CLI command	13:12
pabelanger	mptacekx: we are using 0.4.0 in production for openstack, what do the logs say?	13:12
pabelanger	sounds like your DIBs are failing to build, they get deleted, and start again	13:13
mptacekx	pabelanger: actually it's about VM's, we noticed that time to time they are in building state but available and manually accessible for several minutes already. sometimes it helps if I connect to them manually, then also nodepool pass its chek or vm goes to delete state, so some change is triggered	13:15
pabelanger	mptacekx: check the debug log for nodepool, it will say why it is deleting the VMs	13:17
mptacekx	pabelanger: usually it failes on OpenStackCloudTimeout: Timeout waiting for the server to come up.	13:19
mptacekx	, occasionally on Exception: Unable to run ready script	13:19
mptacekx	how often nodepool should be checking that ?	13:20
pabelanger	mptacekx: so you are hitting 2 issues, first is you fail to boot a VM on your cloud. For that, you'll need to debug the cloud side and see why that is, could be lack of IPs, resources, etc.	13:21
pabelanger	as for ready-script, if you have nodepool setup to use it, it will be run for every node launch. Again, you'll have to debug the output of your ready-script to see what it is doing, common issues are networking to the VM and possible DNS issues from the VM to internet	13:23
mptacekx	pabelanger: I think it's a nodepool issue, VM is spawned properly and I can access it myself, but nodepool didn't. It's not functional problem blocking all attempts, it happens e.g. when 5 vm's are spawned in parallel on 1 of them	13:24
*** Cibo_ has joined #zuul		13:24
pabelanger	mptacekx: do you mind posting your debug logs? That will tell us if it is nodepool or cloud	13:25
pabelanger	it is possible you mind need to update your clouds.yaml file	13:25
*** hashar has joined #zuul		13:26
pabelanger	also, nodepool is hard on clouds, so launching 5 VMs might be an issue too. Easy way to test that, is limit jobs to a single VM to start	13:26
mptacekx	this might help a lot, do you know how to do that ? I think it's simply failing when too many attempts are in parallel	13:27
pabelanger	mptacekx: max-servers setting	13:27
*** Cibo_ has quit IRC		13:28
mptacekx	pabelanger: this will sets just hard limit for number of servers, I thought if there is any way how to tell nodepool not to spawn 5 vm's at oncve	13:29
*** yolanda has quit IRC		13:29
mptacekx	or you're suggesting to slowly increase max-server limits to avoid that ?	13:29
*** yolanda has joined #zuul		13:30
pabelanger	right, I would decrease max-servers now, to see if you are having cloud issues.	13:30
pabelanger	but yes, it is a hard limit for the cloud	13:30
pabelanger	otherwise, just have nodepool keep doing what is does, it is pretty aggressive about booting nodes. Eventually it will boot something :)	13:31
mptacekx	thanks, I will explore that option. There is definitely wrong something in cloud but I thought there is some second issue in nodepool itself, as I mentioned if it timeouts accessing VM which is normally accessible. Is there any way how to debug that apart of logs in /var/log/nodepool/* ?	13:33
mptacekx	All I have is nodepool timeout in that logs	13:33
pabelanger	mptacekx: we also have a rate setting, you could try playing with. Time, in seconds, to wait between operations for a provider	13:33
mptacekx	pabelanger: rate setting ? please ellaborate little more	13:34
pabelanger	mptacekx: that to me sounds like a networking issue, once the VM is online, nodepool cannot SSH into it.	13:34
pabelanger	mptacekx: see: https://docs.openstack.org/infra/nodepool/configuration.html#provider	13:35
mptacekx	how often nodepool is trying ? each min ?	13:35
pabelanger	trying what?	13:35
mptacekx	vm to pass ssh check and turn from build to ready state	13:35
pabelanger	look at boot-timeout, I believe the default is 60 secs	13:36
mptacekx	thanks a lot, I will play with all of that stuff	13:37
pabelanger	sure, np	13:37
dmsimard	So I never (ever) wrote a spec before but I took a shot at writing one for "a job reporting interface" in Zuul -- hope it makes sense: https://review.openstack.org/#/c/444088/	13:48
*** openstackgerrit has joined #zuul		14:13
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add reporter for Federated Message Bus (fedmsg) https://review.openstack.org/426861	14:13
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_alien_list_fail and alien_list cmd https://review.openstack.org/443714	14:58
mordred	dmsimard: nice	15:02
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Add 'requestor' to NodeRequest model https://review.openstack.org/443151	15:07
jeblair	dmsimard: thanks, that looks good. i'll reply with some comments soon. can you point me at an ara server install i can browse?	15:08
dmsimard	jeblair: ara is available in openstack-ansible and kolla-ansible jobs already -- let me give you a few examples	15:08
jeblair	dmsimard: no i meant one running on a server, not the staticly-generated site	15:09
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Add back statsd reporting https://review.openstack.org/443605	15:09
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Remove old/dead classes https://review.openstack.org/443644	15:09
dmsimard	jeblair: sure, but there's no difference, though	15:09
dmsimard	there's no disparity in features	15:09
jeblair	dmsimard: well, i wondered what you see when you go do "http://ara.example.com/" :)	15:10
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove job_list, job_create, job_delete cmds/tests https://review.openstack.org/444344	15:10
dmsimard	jeblair: I mean, there's http://ara-demo.dmsimard.com/	15:10
dmsimard	although hang on, that one isn't up to date	15:10
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Add leaked instance cleanup https://review.openstack.org/443690	15:11
jeblair	mordred: can you do a pass of zuul changes please? some of mine have been sitting for a few days.	15:12
jeblair	mordred: also, feedback on 443973 and 443985 would be appreciated	15:13
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool feature/zuulv3: Stop json-encoding the nodepool metadata https://review.openstack.org/410812	15:16
mordred	jeblair: yup - was just doing the walk on the nodepool changes	15:16
jeblair	mordred: w00t	15:17
*** mptacekx has quit IRC		15:17
dmsimard	jeblair: okay, I updated http://ara-demo.dmsimard.com/ to the latest version (there had been some bugfixes/improvements since I last updated it)	15:17
mordred	jeblair: gah. I could have sworn I reviewed some of these already	15:22
jeblair	mordred, Shrews: in 410812 i wonder if, instead of removing snapshot_image_id, we should update it to record the zookeeper image upload id? that's what it actually is in the current version (recall 'snapshot ~= upload`)	15:24
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_image_upload_fail https://review.openstack.org/444349	15:24
jeblair	mordred, Shrews: otoh, since there's an arbitrary limit on the number of metadata fields, perhaps we should drop it.	15:27
pabelanger	Shrews: yay for statsd things	15:27
Shrews	jeblair: i had a similar question on node_id. i don't currently populate it. should we?	15:27
Shrews	pabelanger: i'm sure we'll need to adjust the keys	15:28
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add a test for a broken config on startup https://review.openstack.org/441499	15:31
pabelanger	Shrews: we can do something testing this morning, if you like. Get some data into graphite.o.o then update dashboards as needed	15:31
Shrews	pabelanger: i'll leave that for you if you want. i'm going to re-enable all the things, then FINALLY remove mysql \o/	15:32
pabelanger	Shrews: sure, I see if the code is landed and restart nl01.o.o	15:33
pabelanger	also, Yay for database removal	15:33
jeblair	Shrews: the old leak algorithm used it to look up the node record to see if it's known. you switched to scanning all of the nodes and checking server id. if we switched back to the old method, we would end up retrieving fewer znode records (since we would only pull the ones for our provider). if we want to stick with the full scan, we can remove it.	15:33
jeblair	pabelanger: please don't have new nodepool send stats to statsd if it has the same provider names as production nodepool.	15:34
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_nodepool_osc_config_reload https://review.openstack.org/444356	15:34
jeblair	pabelanger: we'll end up overwriting our production data.	15:34
jeblair	pabelanger: as a solution, we can either adopt the clarkb method of using different provider names (and performing extra image uploads), or you can configure a statsd prefix so all the new stuff goes to a different place	15:36
Shrews	jeblair: hrm, lemme think that one over. checking external_id does seem safer, but using the node id would be more efficient.	15:36
pabelanger	jeblair: Ah, right. good call.	15:36
pabelanger	let me see how to configure a prefix	15:36
jeblair	pabelanger: 'grep -i statsd_prefix' i think will turn up some things	15:37
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable TestWebApp tests https://review.openstack.org/444358	15:42
pabelanger	jeblair: looks like export STATSD_PREFIX is a thing	15:44
pabelanger	reading up more now	15:44
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Provide file locations of config syntax errors https://review.openstack.org/441606	15:54
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Clarify Job/Build/BuildSet docstrings https://review.openstack.org/435948	15:56
mordred	jeblair: the job graph change looks good to me - other than not passing tests - but who needs tests	16:00
mordred	jeblair: +1 on the spec change	16:02
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool feature/zuulv3: Rename osc to occ in tests https://review.openstack.org/444383	16:05
jeblair	mordred: cool, i've just about finished cleaning up the tests	16:06
*** yolanda has quit IRC		16:06
mordred	jeblair: awesome	16:06
mordred	Shrews: ^^ I just submitted a meaningless followup to one of your patches	16:06
*** yolanda has joined #zuul		16:06
pabelanger	mordred: soooo, how much longer until we can finger zuulv3 :)	16:07
mordred	pabelanger: oh right. I need to finish those patches	16:07
Shrews	mordred: stellar	16:07
pabelanger	mordred: I'm happy to help too	16:08
*** yolanda has quit IRC		16:11
Shrews	hrm... why is this node failing and locked? http://logs.openstack.org/44/444344/1/check/gate-dsvm-nodepool/7ab80aa/console.html#_2017-03-10_16_05_48_329818	16:16
*** yolanda has joined #zuul		16:18
Shrews	ok, this should not be happening: http://logs.openstack.org/44/444344/1/check/gate-dsvm-nodepool/7ab80aa/logs/screen-nodepool.txt.gz#_2017-03-10_15_48_15_721	16:19
mordred	Shrews: I blame jaypipes	16:20
Shrews	mordred: how random	16:23
Shrews	pabelanger: you may want to hold off on updating nl01 until we investigate this odd failure	16:23
pabelanger	Shrews: ack	16:24
Shrews	and i'm a bit stumped atm	16:24
Shrews	oh! i know. it doesn't have an external id yet, so it's racing	16:25
Shrews	wheeeeee	16:26
*** yolanda has quit IRC		16:26
Shrews	i blame EVERYONE else for not catching my mistake	16:26
*** yolanda has joined #zuul		16:27
Shrews	i think we'll have to do jeblair's suggestion to store node_id in metadata and check that against ZK	16:27
pabelanger	Shrews: we should also thing about merging master into feature/zuulv3 for nodepool. Will pick up some testing fixes specifically. eg: we shouldn't be building fedora-25 for the dsvm job	16:28
mordred	pabelanger: with the deletions Shrews has been doing - it might be easier at this point to cherry-pick relevant testing fixes	16:28
mordred	pabelanger: but it's worth a try for sure	16:29
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Improve job dependencies using graph instead of tree https://review.openstack.org/443973	16:29
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Update test config for job graph syntax https://review.openstack.org/444055	16:29
pabelanger	mordred: agree	16:29
Shrews	also, we could skip instances whose state is BUILDING	16:30
jeblair	Shrews: aha. i don't recall if that's the reason we did that originally or not. it may be. at any rate, when you make that change, it's probably worth a comment so that we don't have to learn this lesson (yet?) again. :)	16:33
mordred	jeblair: the best lessons are the lessons you learn over and over again	16:34
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Fix fedora 25 pause bug with devstack https://review.openstack.org/444400	16:34
pabelanger	mordred: Shrews: we actually just need ^	16:34
mordred	++	16:35
Shrews	jeblair: k	16:35
Shrews	Got a lunch appointment with now, so will put up a fix when I return	16:36
mordred	Shrews: silly food eating	16:36
Shrews	mordred: it is needed energy-prep for tonight's game	16:37
mordred	Shrews: yeah. good point	16:38
mordred	I need to find more energy for that myself	16:38
mordred	since each team won on their home court so far - I guess the question is - is NYC truly a second home court for Duke?	16:38
jeblair	pabelanger: tobiash_ has some good comments on 438281. since that's what people seem to favor, do you want to address those and we can move the tox jobs along?	16:41
jeblair	mordred: it's just like duke to have a second home in ny.	16:42
mordred	jeblair: where else would they store all of their i-banker alums?	16:48
*** Cibo_ has joined #zuul		16:50
jeblair	mordred, tobiash_: 443976 (graph) + 444055 (test config file updates) pass tests when combined together now.	16:52
pabelanger	jeblair: toabctl: thanks, left reply / question.	16:52
jeblair	so i think i'd like to get some provisional reviews on the first one and the spec, and then we're ready to merge, i'll squash them.	16:53
jeblair	when we're ready to merge, that is	16:53
mordred	jeblair: fwiw, I do not think squashing those two will make review harder - since there is only one file shared between the two	16:59
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks) https://review.openstack.org/438281	17:02
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks) https://review.openstack.org/438281	17:03
jeblair	mordred: true... though there are a lot of files in the second one.	17:03
mordred	that is true. there are a lot of files in the second one	17:03
jeblair	pabelanger: are we going to fix the 'all' problem?	17:04
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks) https://review.openstack.org/438281	17:04
*** Cibo_ has quit IRC		17:05
jeblair	pabelanger: looks like it. :) some of those still have xenial though	17:05
pabelanger	jeblair: yes, let me fine error message	17:05
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks) https://review.openstack.org/438281	17:06
pabelanger	actually, ^ should expose the problem in ansible log	17:06
pabelanger	will be interesting moving forward on the tox jobs, my thoughts about setting hosts: ubuntu-xenial in the playbook, means a little more control which hosts jobs run on. I think its possible now, for a project to override the nodeset for the tox-py27 job and run on centos-7 lets say	17:09
pabelanger	I guess the same is true that projects and redefine the job and use a new playbook	17:09
jeblair	pabelanger: we can set "final: true" on any job where we don't want people to override things like nodesets	17:10
pabelanger	great	17:11
jeblair	(though, in general, zuulv3 is a bit more trusting that people want to do the right thing)	17:11
pabelanger	ya	17:12
pabelanger	either way, Yay, we have playbooks	17:12
*** hashar has quit IRC		17:20
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: Remove allocator https://review.openstack.org/444425	17:23
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: Remove jenkins_manager https://review.openstack.org/444427	17:25
jeblair	Shrews: approved https://review.openstack.org/443714 with comments	17:26
pabelanger	+3 on deletes too	17:28
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Re-enable test_alien_list_fail and alien_list cmd https://review.openstack.org/443714	17:29
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Remove job_list, job_create, job_delete cmds/tests https://review.openstack.org/444344	17:31
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Re-enable test_image_upload_fail https://review.openstack.org/444349	17:32
pabelanger	jeblair: Shrews: we have 5 locked ready nodes on nl01.o.o, do you mind looking?	17:32
jeblair	sure thing	17:32
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Re-enable test_nodepool_osc_config_reload https://review.openstack.org/444356	17:33
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Re-enable TestWebApp tests https://review.openstack.org/444358	17:33
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Rename osc to occ in tests https://review.openstack.org/444383	17:34
jeblair	pabelanger: http://paste.openstack.org/show/602292/	17:35
jeblair	pabelanger: the dump command has a section where it shows you what sessions hold ephemeral nodes	17:35
jeblair	pabelanger: you can see that all the node locks are held by session 0x15aa4882759000c	17:35
jeblair	pabelanger: and conveniently, that session also holds this node: /nodepool/launchers/nl01-5544-ProviderWorker.infracloud-chocolate	17:37
jeblair	pabelanger: we know that launchers, when they come online, create an ephemeral node with their name so they know which other launchers are active	17:37
jeblair	pabelanger: but it's convenient for us because we can see that it is the launcher which holds those node locks, without having to track down that session id some other way :)	17:37
jeblair	pabelanger: it rather looks like the launcher has frozen?	17:38
jeblair	pabelanger: i don't see any log entries for the past 30 mins	17:38
mordred	jeblair: I think launchers freezing sounds unfun	17:38
jeblair	(btw, we should renamed nodepoold to nodepool-launcher)	17:39
mordred	jeblair: ++	17:39
jeblair	i'm going to sigusr2 it to get a stack dump (i hope)	17:39
jeblair	yay that worked	17:39
jeblair	okay, first of all -- did we port over that paramiko fix to v3? :)	17:40
pabelanger	not sure	17:41
* jeblair sorts relevant/irrelevant threads		17:41
SpamapS	mordred: jeblair fyi, I'm about to start writing a launcher security spec in earnest. I hope to have 1st draft shortly. If you have points you think we haven't discussed, now's a good time to poke them into my brain.	17:42
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add destructor to SSHClient https://review.openstack.org/444433	17:43
pabelanger	cherry-pick of paramiko^	17:44
jeblair	SpamapS: ++. i think we hit all the high points in chats at the ptg.	17:44
jeblair	pabelanger: thanks!	17:44
SpamapS	jeblair: me too, just making sure so I can reduce edits. :)	17:47
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: Handle exception edge cases in node launching https://review.openstack.org/444437	17:54
pabelanger	jeblair: so, back to locks and zookeeper. If I understand, for some reason nodepool-launcher didn't unlock the node. Which, is what I see in the logs too	17:55
jeblair	Shrews, pabelanger: i see the problem. the providerworker is responsible for determining when a launch is complete and releasing the node locks. however, it does that after starting new launches and, if needed, pausing new launches. the way it pauses is to busy-wait right in the middle of the code path between starting new launches and finalizing completed ones. in other words: it is paused indefinitely, waiting for nodes to be released. ...	17:58
jeblair	... they never will be because everyone is waiting on it to release them.	17:58
jeblair	Shrews, pabelanger: in other other words, we may need to move the work that happens in _removeCompletedHandlers out of the ProviderWorker thread.	17:59
jeblair	i will work on a test case for this	17:59
mordred	jeblair: nice catch	18:01
pabelanger	I see	18:02
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Remove allocator https://review.openstack.org/444425	18:11
Shrews	jeblair: ick	18:24
jeblair	Shrews: i'm making good progress on the test (it will take a while due to the complications you pointed out previously), but i haven't started on a solution yet.	18:25
Shrews	that's such a silly mistake. Can't think of a good solution off the top of my head	18:28
Shrews	maybe instead of waiting to fulfill the node set, short-circuit the fulfillment and release the node set we have, trying again later?	18:30
Shrews	jeblair: also, regarding your comments on 443714... removing the database stuff and unused files was going to be last on my list after re-enabling all tests.	18:34
jeblair	Shrews: that will cause large node requests (where large means >1) to starve at the expense of smaller ones when all providers are near quota	18:34
Shrews	ah yeah, don't want that	18:35
Shrews	pabelanger: oops, i re-enabled the image upload fail test and forget we needed this: https://review.openstack.org/435481	18:39
Shrews	mordred: any chance you can +3 https://review.openstack.org/435481 for us?	18:39
mordred	Shrews: yup	18:46
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Disable CleanupWorker thread for test_image_upload_fail https://review.openstack.org/435481	18:52
Shrews	jeblair: congratulations. you encountered our first configuration file disagreement test failure: http://logs.openstack.org/27/444427/1/check/nodepool-coverage-ubuntu-xenial/185e0db/console.html#_2017-03-10_17_30_07_610345	18:54
Shrews	UploadWorker working from the old config, CleanupWorker working from the new	18:54
Shrews	i don't know how to eliminate that totally	18:55
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: Add a failing test of node assignment at quota https://review.openstack.org/444462	18:55
jeblair	Shrews: ^	18:56
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Merge pull requests from github reporter https://review.openstack.org/444463	18:59
* SpamapS learning about the dark corners of cgroups, containers, and selinux.		19:04
Shrews	jeblair: can haz node_quota.yaml file?	19:04
mordred	SpamapS: fun for you!	19:05
jeblair	derp	19:05
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: Add a failing test of node assignment at quota https://review.openstack.org/444462	19:06
jeblair	Shrews: ^	19:06
dmsimard	SpamapS: even the bright corners of those are scary, good luck sir	19:06
SpamapS	dmsimard: so true :)	19:09
jeblair	Shrews: i think we may want shared locks on builds for that. get a (shared) read lock on an image to perform an upload, get a write lock on the image to delete it.	19:15
Shrews	jeblair: yeah, well... kazoo doesn't have that	19:15
jeblair	Shrews: https://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks	19:15
jeblair	doesn't look too hard	19:16
Shrews	jeblair: https://github.com/python-zk/kazoo/pull/306	19:16
Shrews	at least mordred approved the PR! :)	19:17
jeblair	i read that as mordred volunteering to fix it	19:17
mordred	heh	19:18
jeblair	after lunch i will look at what is required to get that pr into shape	19:18
mordred	jeblair: I think mostly it's just about yelling at harlowja right?	19:19
jeblair	(before lunch, i will try to figure out how to use github)	19:19
harlowja	lol	19:19
harlowja	whatt	19:19
mordred	jeblair: good luck ... it's a pretty shitty ui	19:19
Shrews	harlowja: the kazoo shared locks thing came up again	19:20
mordred	harlowja: we've hit the point where https://github.com/python-zk/kazoo/pull/306 is gonna be important - so jeblair is going to work on it	19:20
harlowja	uh oh	19:20
harlowja	sweet	19:20
pabelanger	I too like to live dangerously	19:20
harlowja	mordred bbangert is more senior in that library then me, so once he's happy i'm happy	19:20
harlowja	then i can click merge	19:20
harlowja	lol	19:20
mordred	jeblair: the fun part is that you can't submit patches to that PR - you'll have to pull those patches into your own repo and put up a completely different pr	19:20
jeblair	i what?	19:21
jeblair	how do i collaborate?	19:21
mordred	jeblair: you don't	19:21
mordred	you fork	19:21
jeblair	but i thought github was about collaborating with people?	19:21
mordred	github is all about celebrating the individual ego	19:21
harlowja	none of that collaboration crap	19:21
harlowja	lol	19:21
jeblair	harlowja: you may get an email with a diff.	19:21
harlowja	not sure if i can update someone else's PR either	19:21
harlowja	lol	19:21
Shrews	jeblair: that seems like a LHF task. i'd personally rather see you on zuul things and maybe get a volunteer for the kazoo patch	19:22
mordred	nope. the only way to update a PR is to push more patches to the branch the PR is a request to merge	19:22
Shrews	jeblair: on the other hand, i'm almost done with nodepool, so....	19:22
mordred	Shrews offers to jump on the GH grenade	19:22
harlowja	i already jumped on the manifesto grenade	19:23
harlowja	lol	19:23
Shrews	well, i'm hoping to NOT have to... just would rather see jeblair's scarce time better used	19:23
Shrews	doesn't SpamapS have some new resources to point at things like that? ;)	19:25
Shrews	jeblair: mordred: SpamapS: I propose we put this as a topic for the next zuul meeting and see who has the time to see that PR through. Unless Jim just REALLY wants to work on it.	19:31
jeblair	Shrews: i'm curious enough to spend a few minutes on it, but i'll give myself a short timeout. :)	19:31
Shrews	jeblair: fair enough	19:31
*** bhavik1 has joined #zuul		19:33
jeblair	"test_dirty_sock"	19:34
*** bhavik1 has quit IRC		19:37
jeblair	Shrews: the only ideas coming to my head so far for the deadlock are: 1) have a new thread-per-provider to handle the nodelauncher poll/cleanup. 2) set an attribute on the providerworker indicating we are paused so we can proceed through the main loop without accepting new requests.	19:37
Shrews	jeblair: yeah. i'm going to experiment with #2	19:39
Shrews	jeblair: wow. just what timing issues has your https://review.openstack.org/444427 review exposed? New failures each run	19:43
* Shrews tries hard to avoid looking at two problems at once		19:43
SpamapS	wha hoo?	19:44
SpamapS	interesting	19:51
SpamapS	lxc in its original form is EOL	19:51
SpamapS	lxc 2.0 is just lxd in local-socket-comm-only mode.	19:51
SpamapS	(well technically LXC 1.x is supported until 2019, but that's effectively dead to me :)	19:51
openstackgerrit	K Jonathan Harker proposed openstack-infra/nodepool master: Write per-label nodepool demand info to statsd https://review.openstack.org/246037	19:59
jeblair	Shrews: that wase a removal of a file that isn't used or even loaded by anything carefully crafted to expose subtle timing errors. apparently.	20:07
SpamapS	https://review.openstack.org/444495 <-- Security spec 1st draft	20:33
SpamapS	It's pretty light on details. I think we'll want to have subject matter experts weigh in on things.	20:33
mordred	jesusaur: ^^ your patch there - Shrews just reworked statsd reporting in the v3 nodepool fwiw	20:40
pabelanger	mordred: jeblair: speaking of statsd, any reason not to +3 444363? currently has 2 +2	20:42
mordred	pabelanger: nope	20:43
*** hashar has joined #zuul		20:45
jesusaur	Shrews: how drastically has nodepool statsd reporting changed?	20:47
Shrews	jesusaur: take a look at the StatsReporter class in nodepool.py	20:47
jeblair	well, the main thing relevant to that change is that the allocator is completely gone	20:53
Shrews	jeblair: I think I have a fix for the deadlock. But the leaked instance race keeps biting me, so I'm going to now fix that and put that up ahead of it.	21:09
jeblair	kk	21:12
Shrews	jeblair: actually, i'll just put out what i have and rebase your review when i have it	21:13
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix failure of node assignment at quota https://review.openstack.org/444462	21:13
Shrews	jeblair: ^^^	21:13
Shrews	jeblair: tl;dr, I've made the NodeRequestHandler code re-entrant so that it maintains state across calls to run()	21:14
Shrews	jeblair: when ProviderWorker is paused, it just drains the handlers that are waiting on nodes until they're all finished	21:15
* jeblair pauses kazoo timer and context switches		21:16
Shrews	mordred: i have no idea why your nodepool metadata change is in merge conflict. it applied cleanly for me on top of current feature/zuulv3	21:25
jeblair	Shrews: i get it. i left a couple comments.	21:31
Shrews	jeblair: thx	21:39
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Stop json-encoding the nodepool metadata https://review.openstack.org/410812	21:41
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Use node ID for instance leak detection https://review.openstack.org/444508	21:41
Shrews	mordred: rebased for you	21:41
mordred	Shrews: thanks!	21:45
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: support github pull reqeust labels https://review.openstack.org/444511	21:47
jeblair	harlowja: while i'm in there, would you prefer RLock/WLock to be called ReadLock/WriteLock ?	21:48
harlowja	doesn't matter to me	21:48
* jeblair paints bikesheds already painted		21:49
harlowja	match python threading stuff?	21:49
harlowja	that'd be fine with me	21:49
harlowja	oh wait, python doesn't have it	21:49
harlowja	match the other library i made that does have it, lol	21:49
harlowja	ReadLock/WriteLock matches more of what i did there	21:50
harlowja	in https://github.com/harlowja/fasteners/blob/master/fasteners/lock.py#L100	21:50
jeblair	cool, i like that better. i have obtained validation. :)	21:50
harlowja	u can go read the upstream python change for that	21:50
harlowja	they bikesheded all over that	21:50
harlowja	http://bugs.python.org/issue8800 from what i rememer	21:51
harlowja	'Seems to have fizzled out due to the intense amount of bikeshedding required.'	21:51
harlowja	lol	21:51
harlowja	http://bugs.python.org/issue8800#msg274795 (wasn't kidding)	21:52
jeblair	wow, a complete example of the law of triviality! https://en.wikipedia.org/wiki/Law_of_triviality	21:53
jeblair	the term references the act of killing an idea with trivial arguments	21:53
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix failure of node assignment at quota https://review.openstack.org/444462	21:53
jeblair	i think we usually use it to innoculate ourselves against that	21:54
harlowja	ya, so i read over that bug a long time ago, was like wtf, and then just made something	21:54
harlowja	to much bikeshed for me	21:54
harlowja	lol	21:54
harlowja	feels bad that such patch started in 2010	21:55
harlowja	:(	21:55
harlowja	poor author	21:55
harlowja	sometimes on such threads i want to punch the other commenters in the nuts and tell them to have some feelings	21:57
harlowja	but i can't say such things on such threads	21:57
harlowja	lol	21:57
Shrews	Ok, I have fixes up for the big flaws found today. Going to call it a week and prepare for the sportsball things. Night all	22:05
jeblair	harlowja: okay, clear your schedule. i'm preparing a pr for you. :)	22:06
jeblair	Shrews: goodnight! happy sportsball!	22:06
jeblair	harlowja: as soon as i figure out how. you may have a few minutes. :)	22:07
harlowja	lol	22:07
harlowja	counting down	22:07
jeblair	harlowja: https://github.com/python-zk/kazoo/pull/419 make magic happen!	22:11
harlowja	yes sir	22:11
openstackgerrit	James E. Blair proposed openstack-infra/nodepool feature/zuulv3: Store a pointer to the paused node request handler https://review.openstack.org/444520	22:33
jeblair	Shrews: +2 with an option ^	22:34
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Add destructor to SSHClient https://review.openstack.org/444433	22:37
jeblair	harlowja: hrm, it looks like the election tests are hanging	22:44
harlowja	poopie	22:44
jeblair	that doesn't make sense to me; i'm looking into it	22:44
harlowja	travis and this stuff has always been let's say unreliable	22:44
harlowja	so it may be travis fault, but may not, ha	22:44
harlowja	but ya, looks like election something or other	22:45
harlowja	afaik election stuff is just using a lock	22:45
jeblair	i think i can reproduce it locally, and i'm pretty sure i had a full working run before starting	22:45
harlowja	k	22:45
harlowja	https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/election.py#L53-L54	22:46
harlowja	lol	22:46
harlowja	that whole recipe is funny	22:46
harlowja	lol	22:46
jeblair	harlowja: derp, i think i see the issue	22:46
harlowja	almost feels like it shouldn't exist	22:46
harlowja	since it adds about no logic to the lock recipe	22:46
harlowja	lol	22:46
jeblair	harlowja: okay, fixed locally...	22:52
jeblair	harlowja: while i'm here -- when i looked at the diff, i noticed a couple of minor things i can fix up...	22:52
harlowja	u can do it	22:52
jeblair	harlowja: there used to be a Lock._NODE_NAME. the previous patch got rid of that in favor of passing a veriable to the constructor; i'm making it a class attribute again, but i inadvertently called it Lock.node_name. would you prefer Lock.node_name, Lock.NODE_NAME, or Lock._NODE_NAME? (note that the subclasses override this variable, but otherwise we don't expect users to touch it).	22:54
jeblair	harlowja: https://github.com/python-zk/kazoo/pull/419/files#diff-a08f51f50ea54f2f8138ab6045dc59c0L72	22:55
jeblair	for context	22:55
jeblair	and https://github.com/python-zk/kazoo/pull/419/files#diff-a08f51f50ea54f2f8138ab6045dc59c0R406	22:55
harlowja	hmmmm	23:00
harlowja	i'll let u pick	23:00
jeblair	i'll go with _NODE_NAME and hope that conveys "protected class attribute" :)	23:02
harlowja	wfm	23:02
jeblair	harlowja: https://github.com/python-zk/kazoo/pull/419/ updated	23:03
jeblair	harlowja: builds are passing this time (with the exception of gevent which is failing to install; pretty sure that's not my fault).	23:06
harlowja	kk	23:06
*** rahsah has quit IRC		23:11
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Remove jenkins_manager https://review.openstack.org/444427	23:21
*** saneax-_-\|AFK is now known as saneax		23:30
mordred	harlowja: assuming that PR from jeblair is good, how long do you think it would take to land and get into a release?	23:38
harlowja	once benbangert i guess checks it?	23:39
harlowja	then i can make a release pretty quickly	23:39
mordred	cool!	23:39
* mordred hands harlowja a pie		23:39
harlowja	now u just have to accept my manifesto	23:39
harlowja	lol	23:39
* harlowja takes pie before it gets taken away		23:40
harlowja	lol	23:40
mordred	:)	23:40
* harlowja runs away with pie		23:40
* mordred starts handing harlowja thousands of pies		23:41
* harlowja dies		23:41
mordred	o noes!	23:42
* rbergeron recommends not overdosing nice humans on the pie		23:46
mordred	MOAR PIE FOR EVERYONE	23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!