*** hashar has quit IRC | 00:02 | |
*** saneax is now known as saneax-_-|AFK | 00:56 | |
*** Shuo has joined #zuul | 01:09 | |
Shuo | For the dashboard showing the "Testing Nodes", it shows the number of VMs in-use. Can I somehow translate that into the baremetal machine capacity (if someone starts thinking of budgeting aspect of such problem)? | 01:13 |
---|---|---|
clarkb | Shuo: we dont have insight into how our cloud providers pack and oversubscribe. But each of our test instances is 8vcpu x 8GB ram x at least 80GB disk | 01:24 |
Shuo | clarkb: I would imagine the virtual-to-physical virtulaization factor for CPU can be pretty big (let's say 2:1 for now), but the memory might be 1:1. So I would imagine a 64GB + 16 CPU can give us 8 VMs. So if I make my ballpark guestimation (I don't need to be precise for this :-) ) that 200 such kind of physical machines can serve ~1500 build VMs, which is not too scary from budgeting perspective. | 02:11 |
mordred | jeblair: shade /nodepool gate fixed - feel free to +A those two change | 02:12 |
mordred | Shuo: yes - I believe we have guestimated similar numbers | 02:14 |
Shuo | mordred: thanks :-) | 02:17 |
Shuo | mordred: how did we planned the network proximity? (I heard that those physcial machines are located in different vendors' DC, and they are continentally away from each other?) | 02:24 |
Shuo | my question is more or less about how we move the bits around (say, if we need some new version of image, which is might be a big chunk of bits), are we potentially experiencing network bandwidth problem? | 02:28 |
mordred | Shuo: ah - so ... we do have that situation (we have at least one cloud region in europe) | 02:36 |
mordred | in our case, we deal with everything in terms of regions of clouds and treat regions as physically disparate, even if they are regions from the same provider | 02:36 |
mordred | this means we have to upload new images to each region when we make them (which for us is daily, and many images, and is a LOT of bandwidth) | 02:37 |
mordred | it also means that we have implemented a mirror infrastructure which has per-region mirrors of frequently fetched artifacts | 02:37 |
mordred | and we have scripts that run when a node reaches ready state that set local config on the node to point to the mirror in the same region as the node | 02:38 |
Shuo | mordred: could you elaborate a bit "a LOT of bandwidth" by giving a back of envelop calculation? | 02:38 |
mordred | Shuo: well - we don't pay for bandwidth, so we have made some choices with that in mind ... | 02:38 |
mordred | Shuo: each of our images is in the 8G range | 02:39 |
Shuo | mordred: if you need to update (push) new images to each region once a day, how does it cost "A LOT" bandwidth? | 02:39 |
mordred | and we have ... 6 images and 14 regions of cloud | 02:40 |
mordred | so we push 672G of image updates to our clouds daily from our image build machines | 02:41 |
mordred | _roughly_ | 02:41 |
mordred | the number is slighly higher because the rackspace regions take vhd format which is about twice as big | 02:41 |
Shuo | mordred: is 20 images a good number to think of that problem? if so, 8GB/image * 20 = 1280 Gbit toward a 'region' | 02:41 |
mordred | I honestly think 20 is a bit high - but it would _certainly_ be a good high-water mark | 02:42 |
Shuo | mordred: thanks, you already gave me the answer to my previous question | 02:42 |
mordred | \o/ | 02:42 |
mordred | Shuo: fwiw, we build new images and they are that size because we pre-cache a bunch of things in the image in an attempt to reduce internet traffic during jobs | 02:44 |
mordred | it's possible that for another installation with different usage characteristics than ours, a different tradeoff might be desired | 02:44 |
Shuo | mordred: I guess the above the main bandwidth cost, right? for each individual build/test job, we only need to pull down the python code from github, which is tiny, right? | 02:45 |
mordred | well - you'd think - but the nova repository is actually quite large (as are a few of the others) | 02:46 |
mordred | so we actually have copies of every git repo we deal with in our base images so that the only thing we're fetching at job time is the proposed changes (and any other changes that might have landed that day) | 02:46 |
mordred | we also don't clone from github - the failure rate is too high | 02:46 |
mordred | we run a farm of 8 git mirrors behind a load balancer - although that is currently centralized | 02:47 |
mordred | we have plans to investiage per-region git mirrors, but haven't yet done that | 02:47 |
Shuo | mordred: hmm, then how to manage code line consistency? | 02:47 |
mordred | Shuo: what do you mean? | 02:48 |
Shuo | mordred, let's me try to recap what you said.... | 02:48 |
Shuo | mordred: first, we have a once-a-day network consumption to push the daily-built new image to each region; | 02:49 |
mordred | yes | 02:49 |
Shuo | mordred: then, we have per build/test code pull (and I thought it was a tiny amount of traffic, but you said it could also be huge amount. Ok, let's make a parking lot for that for now and come back to it later). And you said, you have 8 git servers for different region to pull, is this a correct understanding? | 02:52 |
mordred | yes. BUT - it _should_ be a tiny amount of traffic, as it should only at most be a daily delta from what we cached in our image | 02:53 |
mordred | it's possible that our particular approach to pre-caching and mirroring might be overkill for you is the main reason I mention it - and if it is, it could make those image sizes smaller | 02:53 |
Shuo | mordred: "run a farm of 8 git mirrors...", how are these 8 mirrors sync-ed with the primary git? (I just wonder if there exist a tiny window when a mirror git does not hold a commit when the test job asks for it) | 02:57 |
Shuo | mordred: not sure if I asked my question clearly above -- let me know if I did not | 02:58 |
mordred | Shuo: our gerrit server replicates to them on push - so yes, there is a possible race, but the zuul cloning takes that into account | 02:58 |
Shuo | mordred: let me try to describe different git server's sync workflow here, and please correct me if I am not making the right description... | 03:00 |
Shuo | mordred: 1) the code passing the final gating test gets merge into the HEAD on the Gerrit git server; 2) then the Gerrit git server push the that commit onto 8 git replica servers; 3) any build/test job fetch from one of the replica git server; 4) if a particular commit is not available (yet) on the replica git, zuul might wait for some time and retry, I guess. Close enough? ;-) | 03:03 |
mordred | yes. that is it | 03:05 |
mordred | clarkb: ^^ please correct me if I've lied to Shuo ... my brain doens't always work perfectly | 03:05 |
Shuo | is github.com/openstack/A-project one of the 8 replica? or it is the 9th? | 03:06 |
Shuo | mordred: ^^ | 03:13 |
mordred | Shuo: it's the 9th | 03:17 |
mordred | we don't actually use github for anything - we just replicate there for dev convenience | 03:18 |
mordred | Shuo: my flight is landing, so I'm going to afk ... | 03:19 |
Shuo | mordred: kk, thanks for sharing... | 03:20 |
Shuo | mordred: regarding the github question, I was just trying to understand if it's purely a push (in this sense it is same as the other 8 replica) from the gerrit git server (the 'master' git) | 03:21 |
*** Shuo has quit IRC | 03:37 | |
*** bhavik1 has joined #zuul | 04:39 | |
*** bhavik1 has quit IRC | 04:43 | |
*** saneax-_-|AFK is now known as saneax | 05:17 | |
*** Cibo_ has joined #zuul | 05:28 | |
*** bhavik1 has joined #zuul | 05:31 | |
*** bhavik1 has quit IRC | 05:56 | |
*** saneax is now known as saneax-_-|AFK | 08:10 | |
*** saneax-_-|AFK is now known as saneax | 08:18 | |
*** hashar has joined #zuul | 08:32 | |
*** saneax is now known as saneax-_-|AFK | 09:11 | |
*** saneax-_-|AFK is now known as saneax | 09:33 | |
*** pabelanger has quit IRC | 09:35 | |
*** wznoinsk has quit IRC | 09:35 | |
*** mmedvede has quit IRC | 09:35 | |
*** wznoinsk has joined #zuul | 09:35 | |
*** pabelanger has joined #zuul | 09:35 | |
*** mmedvede has joined #zuul | 09:36 | |
*** rbergeron has quit IRC | 09:37 | |
*** rbergeron has joined #zuul | 09:37 | |
*** hogepodge has quit IRC | 09:38 | |
*** SpamapS has quit IRC | 09:38 | |
*** hogepodge has joined #zuul | 09:39 | |
*** openstack has joined #zuul | 14:31 | |
*** saneax-_-|AFK is now known as saneax | 14:32 | |
*** saneax is now known as saneax-_-|AFK | 14:36 | |
*** saneax-_-|AFK is now known as saneax | 14:40 | |
*** Cibo_ has quit IRC | 15:00 | |
openstackgerrit | Lenny Verkhovsky proposed openstack-infra/nodepool: Fixed typo in info msg https://review.openstack.org/418435 | 15:08 |
*** mptacekx has quit IRC | 16:05 | |
*** saneax is now known as saneax-_-|AFK | 16:55 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove unsed variables https://review.openstack.org/418492 | 17:12 |
*** herlo has quit IRC | 17:22 | |
*** herlo has joined #zuul | 17:27 | |
*** herlo has joined #zuul | 17:27 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273 | 17:37 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages https://review.openstack.org/412160 | 17:41 |
*** hashar has quit IRC | 18:02 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Remove the ability for nodepoold to launch a builder https://review.openstack.org/418137 | 18:04 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages https://review.openstack.org/412160 | 18:18 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages https://review.openstack.org/412160 | 18:31 |
*** Shuo has joined #zuul | 18:34 | |
openstackgerrit | Merged openstack-infra/nodepool: Source glean installs in simple-init https://review.openstack.org/414662 | 18:48 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273 | 19:00 |
*** klindgren has joined #zuul | 19:45 | |
harlowja | rbergeron would u be the guy that i can perhaps ask ansible tower questions to, sorta new to it, and klindgren and I had some questions about functiionality | 19:58 |
harlowja | (nothing crazy hard) | 19:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Separate driver interfaces and make abstract https://review.openstack.org/418554 | 20:00 |
rbergeron | hawlowja: not really (being community gal, and its not open source, so ive had super limited interaction with it -- | 20:02 |
rbergeron | but happy to find the right person :) | 20:02 |
jeblair | jhesketh, jamielennox, jlk, SpamapS: ^ the parent of 418554 is the restructuring into drivers change which you may recall; 418554 is all about making a nice api for driver implementors. it's one approach we could take. please check it out and let me know if you like that direction, or think something else would be better. | 20:03 |
harlowja | rbergeron thx, klindgren i think had a question, hopefully he remebers (pattern matching related or something) | 20:28 |
klindgren | basically does ansible tower have a rules engine? As an example can I feed it notification of alerts from a system, and it be able to do matching against those alerts to trigger specific jobs? Or is the expectation that via the API that something else needs to do that work and just call a defined playbook? | 20:31 |
harlowja | (or perhaps said thing is a planned/but not yet feature) | 20:39 |
Shrews | harlowja: is there a safe way to delete a lock node after it is released? | 20:46 |
Shrews | harlowja: with kazoo, that is | 20:47 |
harlowja | hmmm, do u know if its unused by others? | 20:48 |
harlowja | if known unused, ya, just delete the lock directory | 20:50 |
Shrews | harlowja: well now, that's the issue :) | 20:50 |
harlowja | if not known unused | 20:50 |
Shrews | i was hoping there was an option to release() that would delete it before unlocking it, but alas, no | 20:50 |
harlowja | ya, that'd involve a little bit more, cause locks at least via kazoo aren't just single nodes | 20:51 |
harlowja | they are directories | 20:51 |
Shrews | jeblair: i think we may need a cleanup thread in nodepoold to clean up request locks that are older than X | 20:51 |
jeblair | Shrews: yeah, or lock nodes that are for request nodes that don't exist any more | 20:51 |
harlowja | so a release() method would almost need to drop a "this_lock_is_dead" file and wake other waiters to tell them to drop there waiting | 20:52 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273 | 20:53 |
harlowja | Shrews but idea, try to delete the lock directory | 20:55 |
harlowja | i don't think https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/lock.py#L184 will handle it nicely, but one way to find out | 20:55 |
harlowja | lol | 20:55 |
Shrews | harlowja: we'll just do the cleanup thread idea :) | 20:56 |
harlowja | k | 20:56 |
rbergeron | klindgren / harlowja: will ask. no idea. :) | 21:09 |
harlowja | :) | 21:09 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests https://review.openstack.org/418585 | 21:45 |
jhesketh | Morning | 21:46 |
Shrews | jeblair: hrm, i just noticed that we've made some changes to nodepool/zk.py after making the latest features/zuulv3 branch. i guess we can merge those in later, but we'll probably get conflicts. | 21:46 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests https://review.openstack.org/418585 | 21:55 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests https://review.openstack.org/418585 | 21:58 |
harlowja | Shrews how's the overall zookeeper stuff goign btw? | 21:59 |
Shrews | harlowja: pretty decently so far. the current nodepool image builder we run in infra production is using it pretty successfully | 22:01 |
harlowja | cool | 22:10 |
pabelanger | much better then gearman at this point. Zero backed up builds | 23:11 |
*** saneax-_-|AFK is now known as saneax | 23:17 | |
*** morgan has quit IRC | 23:23 | |
*** morgan_ has joined #zuul | 23:38 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273 | 23:40 |
openstackgerrit | Merged openstack-infra/nodepool: Remove the ability for nodepoold to launch a builder https://review.openstack.org/418137 | 23:41 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!