Tuesday, 2017-01-10

*** hashar has quit IRC		00:02
*** saneax is now known as saneax-_-\|AFK		00:56
*** Shuo has joined #zuul		01:09
Shuo	For the dashboard showing the "Testing Nodes", it shows the number of VMs in-use. Can I somehow translate that into the baremetal machine capacity (if someone starts thinking of budgeting aspect of such problem)?	01:13
clarkb	Shuo: we dont have insight into how our cloud providers pack and oversubscribe. But each of our test instances is 8vcpu x 8GB ram x at least 80GB disk	01:24
Shuo	clarkb: I would imagine the virtual-to-physical virtulaization factor for CPU can be pretty big (let's say 2:1 for now), but the memory might be 1:1. So I would imagine a 64GB + 16 CPU can give us 8 VMs. So if I make my ballpark guestimation (I don't need to be precise for this :-) ) that 200 such kind of physical machines can serve ~1500 build VMs, which is not too scary from budgeting perspective.	02:11
mordred	jeblair: shade /nodepool gate fixed - feel free to +A those two change	02:12
mordred	Shuo: yes - I believe we have guestimated similar numbers	02:14
Shuo	mordred: thanks :-)	02:17
Shuo	mordred: how did we planned the network proximity? (I heard that those physcial machines are located in different vendors' DC, and they are continentally away from each other?)	02:24
Shuo	my question is more or less about how we move the bits around (say, if we need some new version of image, which is might be a big chunk of bits), are we potentially experiencing network bandwidth problem?	02:28
mordred	Shuo: ah - so ... we do have that situation (we have at least one cloud region in europe)	02:36
mordred	in our case, we deal with everything in terms of regions of clouds and treat regions as physically disparate, even if they are regions from the same provider	02:36
mordred	this means we have to upload new images to each region when we make them (which for us is daily, and many images, and is a LOT of bandwidth)	02:37
mordred	it also means that we have implemented a mirror infrastructure which has per-region mirrors of frequently fetched artifacts	02:37
mordred	and we have scripts that run when a node reaches ready state that set local config on the node to point to the mirror in the same region as the node	02:38
Shuo	mordred: could you elaborate a bit "a LOT of bandwidth" by giving a back of envelop calculation?	02:38
mordred	Shuo: well - we don't pay for bandwidth, so we have made some choices with that in mind ...	02:38
mordred	Shuo: each of our images is in the 8G range	02:39
Shuo	mordred: if you need to update (push) new images to each region once a day, how does it cost "A LOT" bandwidth?	02:39
mordred	and we have ... 6 images and 14 regions of cloud	02:40
mordred	so we push 672G of image updates to our clouds daily from our image build machines	02:41
mordred	_roughly_	02:41
mordred	the number is slighly higher because the rackspace regions take vhd format which is about twice as big	02:41
Shuo	mordred: is 20 images a good number to think of that problem? if so, 8GB/image * 20 = 1280 Gbit toward a 'region'	02:41
mordred	I honestly think 20 is a bit high - but it would _certainly_ be a good high-water mark	02:42
Shuo	mordred: thanks, you already gave me the answer to my previous question	02:42
mordred	\o/	02:42
mordred	Shuo: fwiw, we build new images and they are that size because we pre-cache a bunch of things in the image in an attempt to reduce internet traffic during jobs	02:44
mordred	it's possible that for another installation with different usage characteristics than ours, a different tradeoff might be desired	02:44
Shuo	mordred: I guess the above the main bandwidth cost, right? for each individual build/test job, we only need to pull down the python code from github, which is tiny, right?	02:45
mordred	well - you'd think - but the nova repository is actually quite large (as are a few of the others)	02:46
mordred	so we actually have copies of every git repo we deal with in our base images so that the only thing we're fetching at job time is the proposed changes (and any other changes that might have landed that day)	02:46
mordred	we also don't clone from github - the failure rate is too high	02:46
mordred	we run a farm of 8 git mirrors behind a load balancer - although that is currently centralized	02:47
mordred	we have plans to investiage per-region git mirrors, but haven't yet done that	02:47
Shuo	mordred: hmm, then how to manage code line consistency?	02:47
mordred	Shuo: what do you mean?	02:48
Shuo	mordred, let's me try to recap what you said....	02:48
Shuo	mordred: first, we have a once-a-day network consumption to push the daily-built new image to each region;	02:49
mordred	yes	02:49
Shuo	mordred: then, we have per build/test code pull (and I thought it was a tiny amount of traffic, but you said it could also be huge amount. Ok, let's make a parking lot for that for now and come back to it later). And you said, you have 8 git servers for different region to pull, is this a correct understanding?	02:52
mordred	yes. BUT - it _should_ be a tiny amount of traffic, as it should only at most be a daily delta from what we cached in our image	02:53
mordred	it's possible that our particular approach to pre-caching and mirroring might be overkill for you is the main reason I mention it - and if it is, it could make those image sizes smaller	02:53
Shuo	mordred: "run a farm of 8 git mirrors...", how are these 8 mirrors sync-ed with the primary git? (I just wonder if there exist a tiny window when a mirror git does not hold a commit when the test job asks for it)	02:57
Shuo	mordred: not sure if I asked my question clearly above -- let me know if I did not	02:58
mordred	Shuo: our gerrit server replicates to them on push - so yes, there is a possible race, but the zuul cloning takes that into account	02:58
Shuo	mordred: let me try to describe different git server's sync workflow here, and please correct me if I am not making the right description...	03:00
Shuo	mordred: 1) the code passing the final gating test gets merge into the HEAD on the Gerrit git server; 2) then the Gerrit git server push the that commit onto 8 git replica servers; 3) any build/test job fetch from one of the replica git server; 4) if a particular commit is not available (yet) on the replica git, zuul might wait for some time and retry, I guess. Close enough? ;-)	03:03
mordred	yes. that is it	03:05
mordred	clarkb: ^^ please correct me if I've lied to Shuo ... my brain doens't always work perfectly	03:05
Shuo	is github.com/openstack/A-project one of the 8 replica? or it is the 9th?	03:06
Shuo	mordred: ^^	03:13
mordred	Shuo: it's the 9th	03:17
mordred	we don't actually use github for anything - we just replicate there for dev convenience	03:18
mordred	Shuo: my flight is landing, so I'm going to afk ...	03:19
Shuo	mordred: kk, thanks for sharing...	03:20
Shuo	mordred: regarding the github question, I was just trying to understand if it's purely a push (in this sense it is same as the other 8 replica) from the gerrit git server (the 'master' git)	03:21
*** Shuo has quit IRC		03:37
*** bhavik1 has joined #zuul		04:39
*** bhavik1 has quit IRC		04:43
*** saneax-_-\|AFK is now known as saneax		05:17
*** Cibo_ has joined #zuul		05:28
*** bhavik1 has joined #zuul		05:31
*** bhavik1 has quit IRC		05:56
*** saneax is now known as saneax-_-\|AFK		08:10
*** saneax-_-\|AFK is now known as saneax		08:18
*** hashar has joined #zuul		08:32
*** saneax is now known as saneax-_-\|AFK		09:11
*** saneax-_-\|AFK is now known as saneax		09:33
*** pabelanger has quit IRC		09:35
*** wznoinsk has quit IRC		09:35
*** mmedvede has quit IRC		09:35
*** wznoinsk has joined #zuul		09:35
*** pabelanger has joined #zuul		09:35
*** mmedvede has joined #zuul		09:36
*** rbergeron has quit IRC		09:37
*** rbergeron has joined #zuul		09:37
*** hogepodge has quit IRC		09:38
*** SpamapS has quit IRC		09:38
*** hogepodge has joined #zuul		09:39
*** openstack has joined #zuul		14:31
*** saneax-_-\|AFK is now known as saneax		14:32
*** saneax is now known as saneax-_-\|AFK		14:36
*** saneax-_-\|AFK is now known as saneax		14:40
*** Cibo_ has quit IRC		15:00
openstackgerrit	Lenny Verkhovsky proposed openstack-infra/nodepool: Fixed typo in info msg https://review.openstack.org/418435	15:08
*** mptacekx has quit IRC		16:05
*** saneax is now known as saneax-_-\|AFK		16:55
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Remove unsed variables https://review.openstack.org/418492	17:12
*** herlo has quit IRC		17:22
*** herlo has joined #zuul		17:27
*** herlo has joined #zuul		17:27
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273	17:37
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages https://review.openstack.org/412160	17:41
*** hashar has quit IRC		18:02
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Remove the ability for nodepoold to launch a builder https://review.openstack.org/418137	18:04
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages https://review.openstack.org/412160	18:18
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages https://review.openstack.org/412160	18:31
*** Shuo has joined #zuul		18:34
openstackgerrit	Merged openstack-infra/nodepool: Source glean installs in simple-init https://review.openstack.org/414662	18:48
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273	19:00
*** klindgren has joined #zuul		19:45
harlowja	rbergeron would u be the guy that i can perhaps ask ansible tower questions to, sorta new to it, and klindgren and I had some questions about functiionality	19:58
harlowja	(nothing crazy hard)	19:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul: Separate driver interfaces and make abstract https://review.openstack.org/418554	20:00
rbergeron	hawlowja: not really (being community gal, and its not open source, so ive had super limited interaction with it --	20:02
rbergeron	but happy to find the right person :)	20:02
jeblair	jhesketh, jamielennox, jlk, SpamapS: ^ the parent of 418554 is the restructuring into drivers change which you may recall; 418554 is all about making a nice api for driver implementors. it's one approach we could take. please check it out and let me know if you like that direction, or think something else would be better.	20:03
harlowja	rbergeron thx, klindgren i think had a question, hopefully he remebers (pattern matching related or something)	20:28
klindgren	basically does ansible tower have a rules engine? As an example can I feed it notification of alerts from a system, and it be able to do matching against those alerts to trigger specific jobs? Or is the expectation that via the API that something else needs to do that work and just call a defined playbook?	20:31
harlowja	(or perhaps said thing is a planned/but not yet feature)	20:39
Shrews	harlowja: is there a safe way to delete a lock node after it is released?	20:46
Shrews	harlowja: with kazoo, that is	20:47
harlowja	hmmm, do u know if its unused by others?	20:48
harlowja	if known unused, ya, just delete the lock directory	20:50
Shrews	harlowja: well now, that's the issue :)	20:50
harlowja	if not known unused	20:50
Shrews	i was hoping there was an option to release() that would delete it before unlocking it, but alas, no	20:50
harlowja	ya, that'd involve a little bit more, cause locks at least via kazoo aren't just single nodes	20:51
harlowja	they are directories	20:51
Shrews	jeblair: i think we may need a cleanup thread in nodepoold to clean up request locks that are older than X	20:51
jeblair	Shrews: yeah, or lock nodes that are for request nodes that don't exist any more	20:51
harlowja	so a release() method would almost need to drop a "this_lock_is_dead" file and wake other waiters to tell them to drop there waiting	20:52
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273	20:53
harlowja	Shrews but idea, try to delete the lock directory	20:55
harlowja	i don't think https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/lock.py#L184 will handle it nicely, but one way to find out	20:55
harlowja	lol	20:55
Shrews	harlowja: we'll just do the cleanup thread idea :)	20:56
harlowja	k	20:56
rbergeron	klindgren / harlowja: will ask. no idea. :)	21:09
harlowja	:)	21:09
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests https://review.openstack.org/418585	21:45
jhesketh	Morning	21:46
Shrews	jeblair: hrm, i just noticed that we've made some changes to nodepool/zk.py after making the latest features/zuulv3 branch. i guess we can merge those in later, but we'll probably get conflicts.	21:46
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests https://review.openstack.org/418585	21:55
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests https://review.openstack.org/418585	21:58
harlowja	Shrews how's the overall zookeeper stuff goign btw?	21:59
Shrews	harlowja: pretty decently so far. the current nodepool image builder we run in infra production is using it pretty successfully	22:01
harlowja	cool	22:10
pabelanger	much better then gearman at this point. Zero backed up builds	23:11
*** saneax-_-\|AFK is now known as saneax		23:17
*** morgan has quit IRC		23:23
*** morgan_ has joined #zuul		23:38
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273	23:40
openstackgerrit	Merged openstack-infra/nodepool: Remove the ability for nodepoold to launch a builder https://review.openstack.org/418137	23:41

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!