*** rcarrillocruz has quit IRC | 01:11 | |
sfbender | Tristan de Cacqueray created software-factory/sfinfo master: Add patternfly-react-ui-deps package https://softwarefactory-project.io/r/13789 | 03:53 |
---|---|---|
sfbender | Merged software-factory/sfinfo master: Add patternfly-react-ui-deps package https://softwarefactory-project.io/r/13789 | 03:55 |
sfbender | Merged www.softwarefactory-project.io master: 2018-38 summary https://softwarefactory-project.io/r/13706 | 05:41 |
*** sfbender has quit IRC | 06:06 | |
*** nijaba has quit IRC | 06:07 | |
*** nijaba has joined #softwarefactory | 06:08 | |
*** chkumar|off is now known as chandankumar | 07:18 | |
*** jpena|off is now known as jpena | 08:01 | |
spredzy | tristanC: yo | 08:16 |
spredzy | If you're around, any way you can help me out figure out why https://github.com/ansible/awx/pull/2309 isn't running the proper job on zuul ? | 08:17 |
spredzy | I see the event being picked up by zuul, but seems it attaches no job to it | 08:17 |
tristanC | spredzy: would you like to join mumble? | 08:20 |
* spredzy joins | 08:21 | |
spredzy | https://github.com/ansible/zuul-config/blob/master/zuul.d/projects.yaml#L7-L12 | 08:22 |
spredzy | https://github.com/ansible/zuul-jobs/blob/master/zuul.d/templates.yaml | 08:22 |
tristanC | spredzy: i think you need https://ansible.softwarefactory-project.io/docs/zuul/user/config.html#attr-pragma.implied-branch-matchers | 08:31 |
tristanC | because zuul.layout: Project template <ProjectConfig python source: ansible/zuul-jobs/zuul.d/templates.yaml@master {ImpliedBranchMatcher:master}> did not match item | 08:32 |
tristanC | spredzy: https://ansible.softwarefactory-project.io/docs/zuul/user/config.html#attr-pragma.implied-branches | 08:35 |
*** sfbender has joined #softwarefactory | 08:57 | |
sfbender | Merged software-factory/managesf master: Fix issue where we expect to have the group resources https://softwarefactory-project.io/r/13776 | 08:57 |
sfbender | Merged software-factory/sf-config master: Fix config/sf-jobs location path gen for external gerrit https://softwarefactory-project.io/r/13760 | 09:02 |
*** zoli is now known as zoli|lunch | 09:59 | |
*** zoli|lunch is now known as zoli | 09:59 | |
*** jpena is now known as jpena|lunch | 11:06 | |
matburt | tristanC how's that nodepool config with the ansible static nodes looking? | 11:42 |
tristanC | matburt: it's looking like: https://softwarefactory-project.io/cgit/config/tree/nodepool/ansible.yaml | 11:49 |
tristanC | matburt: shanemcd-: i sent you a mail about it btw (titled "Ansible zuul/nodepool setup with AWS"), but it seems like it didn't reached any of the rcpt inbox :( | 11:50 |
matburt | let me look | 11:51 |
matburt | hah I do have it... not sure why I glossed over it. | 12:01 |
sfbender | Merged www.softwarefactory-project.io master: Add Kubernetes Nodepool Driver blog post https://softwarefactory-project.io/r/13725 | 12:02 |
matburt | tristanC given that we have the template defined in zuul-jobs (and no current definition on runner or awx) can you assign the nodeset to a project template? | 12:03 |
tristanC | matburt: yes, you can set the "nodeset" job attribute similarly to the "vars", e.g. here: https://github.com/ansible/zuul-jobs/blob/master/zuul.d/templates.yaml#L12 | 12:04 |
tristanC | or here is another example: https://softwarefactory-project.io/cgit/DLRN/tree/.zuul.yaml#n53 | 12:05 |
tristanC | spredzy: not sure what happened, but https://github.com/ansible/zuul-jobs/commits/master 503 :) | 12:05 |
tristanC | oh nevermind, it's now loading | 12:06 |
matburt | https://github.com/ansible/zuul-jobs/pull/12 | 12:17 |
matburt | let me rekick my awx job... which is busted now, but I want to see what's needed on the static nodes | 12:18 |
tristanC | matburt: we might want to adapt the base job to make sure the src directory is absent before copying the workspace, i don't think it's actually cleaned otherwise | 12:19 |
matburt | hmm sounds good | 12:19 |
matburt | I have to hop on a meeting, I'll be back in a few | 12:19 |
matburt | scratch that... no meeting. | 12:22 |
matburt | I reckon I'm going to need some dependencies installed on the static nodes | 12:22 |
tristanC | matburt: we should keep static node customisation in a playbook to be merged in ansible/zuul-config/nodepool, next to the script i proposed to create the k8s image | 12:24 |
tristanC | matburt: then we could have post and periodic job that may run it | 12:24 |
matburt | agreed | 12:25 |
tristanC | matburt: we actually do that for runC slave, a similar job can be added to ansible/zuul-config post pipeline | 12:25 |
tristanC | matburt: shanemcd-: btw, the blogpost about k8s nodepool driver is now published here: https://www.softwarefactory-project.io/kubernetes-nodepool-driver-tech-preview.html | 12:27 |
matburt | nice... I want to dig into that more for our smoke tests tristanC | 12:28 |
matburt | which I'll need to turn up pretty soon | 12:29 |
matburt | This week the goal is to get linters and unit/functional tests running. Next week shanemcd- and I are going to be at Ansiblefest | 12:30 |
tristanC | matburt: sure, well i haven't added the k8s prodiver in production nodepool, it's only working in my sandbox. let me know when you are ready to use it and we'll enable it on sf-project.io | 12:31 |
matburt | tristanC do you think that k8s job should merge? | 12:31 |
tristanC | what do you mean by "should merge"? | 12:32 |
matburt | okay gotcha. I don't want to rock the boat... if we have some time we might can turn it up this week (in the short time we have left) otherwise we might wait until after fest. | 12:32 |
matburt | https://github.com/ansible/zuul-config/pull/21 | 12:32 |
matburt | that's passing the checks and just waiting to merge | 12:32 |
tristanC | matburt: actually that job should be marked as "abstract", it doesn't have a run phase and doesn't do anything, it's meant to be used as a parent job for your smoke tests job | 12:33 |
matburt | gotcha | 12:33 |
tristanC | matburt: we could merge and iterate over the script, or keep it open until we are satisfied with the content, i don't mind either way | 12:34 |
matburt | it looks like this just produces an image so it might not necessarily need to run on every build? | 12:34 |
*** jpena|lunch is now known as jpena | 12:34 | |
matburt | I might need to look into how abstract tasks work and parent jobs and how those work together with regular jobs | 12:35 |
tristanC | matburt: the awx-test-image.sh isn't actually used, it's just the recipe i used for docker://docker.io/softwarefactoryproject/awx-test-image | 12:35 |
matburt | yep indeed | 12:36 |
matburt | also a good chance to dive into buildah ;) | 12:36 |
matburt | Oh I see... base-pod | 12:36 |
* spredzy would need to dive into it too - never used up until now | 12:43 | |
matburt | tristanC do you have some examples of how yall typically prepare nodepool members for use? Something I could borrow for inspiration when putting together the static nodepool systems | 12:44 |
tristanC | matburt: it's not pretty, but we run these tasks on the runC host: https://softwarefactory-project.io/cgit/config/tree/nodepool/runC/customize.yaml | 12:49 |
matburt | excellent, I appreciate that | 12:49 |
tristanC | you could write a static-customize playbook that runs on a "ansible-static" group, then we could generate the inventory out of that list: https://softwarefactory-project.io/cgit/config/tree/nodepool/ansible.yaml | 12:50 |
*** zoli is now known as zoli|afk | 13:05 | |
pabelanger | tristanC: nhicher: we have 6 nodes marked ready in nodepool, but zuul isn't using them | 13:43 |
pabelanger | can you look at nodepool-launcher and see why? | 13:43 |
pabelanger | remote: https://softwarefactory-project.io/r/13792 Drop max-ready-age to 30mins | 13:46 |
pabelanger | I am going to approve ^ to see if we properly clean them up | 13:47 |
pabelanger | but, I also think we are reaching the limits of a single nodpeool-launcher | 13:47 |
pabelanger | I'd love for us to create nl02 for software-factory | 13:48 |
pabelanger | okay, nodepool managed to clean them up | 13:54 |
pabelanger | but would be interested to see why we are leaking them to start with | 13:54 |
nhicher | pabelanger: there is vexxhost-ansible-network-mtl1 in node in-use | 13:55 |
pabelanger | nhicher: yah, before that we had 5 vexxhost-ansible-network-sjc1 nodes ready for 30+mins | 13:56 |
pabelanger | that shouldn't happen, because we set min-ready: 0, to avoid billing charges | 13:56 |
pabelanger | so, don't know if nodepool-launcher is overloaded, or we are some how launcher too many VMs against requests | 13:57 |
pabelanger | I've lowered max-ready-age to 30mins to help, but means in some cases we now wait 30mins for jobs to run | 13:57 |
pabelanger | I'm thinking 5mins is likely a good time for max-age, but really want to know why we are leaking them in the first place | 13:58 |
nhicher | pabelanger: there is a lot of error on launcher.log, I will check was is the issue after my daily meeting | 14:00 |
pabelanger | nhicher: great, thanks | 14:00 |
nhicher | not only for ansible-network, but tripleo, rdocloud ... | 14:00 |
pabelanger | nhicher: is it quota errors? | 14:02 |
nhicher | pabelanger: ERROR nodepool.NodeLauncher-0000193913: Launch failed for node $uuid | 14:08 |
pabelanger | nhicher: is there atraceback? | 14:09 |
pabelanger | nhicher: is nodepool-launcher at 100%CPU by chance? | 14:11 |
*** zoli|afk is now known as zoli | 14:14 | |
nhicher | floating ip issue pabelanger :/ | 14:17 |
pabelanger | nhicher: for vexxhost? | 14:18 |
pabelanger | we shouldn't be using FIPs there | 14:18 |
nhicher | no, for rdo-cloud* | 14:18 |
pabelanger | nhicher: ack, so are you seeing any errors for vexxhost specifically? | 14:18 |
nhicher | I have to check, the error was for ansible-network-vyos-1.1.8 | 14:19 |
pabelanger | nhicher: ah, okay | 14:19 |
pabelanger | yes, that is still on rdocloud | 14:19 |
pabelanger | 1 sec, let me get you a UUID for vexxhost | 14:20 |
nhicher | right now, there are 4 nodes in-use for vexxhost | 14:20 |
pabelanger | nhicher: ansible-fedora-28-vexxhost-ansible-network-mtl1-0000194492 | 14:21 |
pabelanger | nhicher: can you check the state changes for that in nodpeool-launcher | 14:21 |
pabelanger | eg: I think that was ready for a good 20 mins | 14:21 |
pabelanger | but never allocated to zuul | 14:21 |
nhicher | https://softwarefactory-project.io/paste/show/1219/ | 14:21 |
nhicher | https://softwarefactory-project.io/paste/show/1220/ | 14:23 |
pabelanger | nhicher: yah, running jobs are working great. There seems to be some lag between state changes between nodepool and zuul. | 14:24 |
pabelanger | openstack says node is ready, but nodepool or zuul doesn't see ready I think | 14:24 |
pabelanger | So, trying to figure out if it is related to excess load on nodepool-launcher (100% cpu) or some other issue in nodepool / zuul | 14:25 |
nhicher | nodepool-launcher 30% cpu (we have 4 cores) | 14:29 |
nhicher | pabelanger: load average: 0.16, 0.24, 0.31 | 14:29 |
pabelanger | nhicher: what about the nodepool-launcher pid? | 14:29 |
pabelanger | sadly it isn't multicore | 14:30 |
nhicher | pabelanger: we already had this issue https://tree.taiga.io/project/morucci-software-factory/issue/1561 | 14:46 |
pabelanger | nhicher: yah, the launcher logs should help here. | 14:47 |
pabelanger | nhicher: 0000194492 should be a good example to look at | 14:47 |
nhicher | pabelanger: for 194492, nodepool started to build node at 14:17:38 and zuul started job at 14:18:19 | 14:51 |
pabelanger | hmm | 14:51 |
pabelanger | nhicher: okay, let me try and reproduce | 14:51 |
pabelanger | need to get a few jobs into check pipeline | 14:51 |
nhicher | pabelanger: https://softwarefactory-project.io/paste/show/1221/ | 14:52 |
shanemcd- | Hi tristanC apologies I have been MIA for most of this week. Things have been hectic here getting ready for AnsibleFest. I will pick this up ASAP. | 14:54 |
shanemcd- | Thank you again for all your help on this. | 14:54 |
pabelanger | nhicher: classic heisenbug | 15:00 |
pabelanger | things seem to be working well now | 15:01 |
pabelanger | the next time is happens, I'll grab the uuid | 15:01 |
nhicher | pabelanger: ok | 15:03 |
*** chandankumar is now known as chkumar|off | 15:46 | |
sfbender | Fabien Boucher created software-factory/managesf master: wip - resources add cli tool https://softwarefactory-project.io/r/13794 | 16:01 |
matburt | tristanC if you get a chance can you take a look at this: https://ansible.softwarefactory-project.io/logs/09/2309/c781c6a116ce5bddf875379204187a0d86277de2/check/tox/015f1ef/ara-report/ | 16:48 |
matburt | n/m... revoked sudo for zuul | 16:55 |
pabelanger | looks like network issue | 16:57 |
pabelanger | https://ansible.softwarefactory-project.io/logs/09/2309/c781c6a116ce5bddf875379204187a0d86277de2/check/tox/015f1ef/job-output.txt.gz#_2018-09-27_16_45_22_770843 | 16:57 |
pabelanger | oh, is this a VM? | 16:58 |
pabelanger | or oci | 16:58 |
pabelanger | static-ansible | 16:58 |
pabelanger | guessing a container | 16:59 |
pabelanger | matburt: I think the issue is, you don't have access to sudo there | 16:59 |
pabelanger | so anytime you use it, it will fail | 16:59 |
matburt | it actually was that zuul *had* access to sudo and couldn't revoke it on it's own | 16:59 |
matburt | this is a GCE instance so google has special sudo groups that the user needed to be removed from | 17:00 |
pabelanger | okay, then we likely need to update revoke-sudo role | 17:00 |
pabelanger | https://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/revoke-sudo | 17:01 |
pabelanger | matburt: keep in mind, might be the first person using GCE, so expect issues some of the zuul-jobs | 17:02 |
matburt | yah, they are just static nodes | 17:02 |
matburt | trying to get our deps to line up | 17:02 |
pabelanger | matburt: there are VMs in rdocloud which will get the jobs working out of box. you could start with them to get jobs working, then migrate once jobs shake out | 17:04 |
matburt | That's what we did with runner, they were great but we were looking to get some dedicated capacity | 17:05 |
matburt | so we worked with tristanC to get these turned up | 17:05 |
matburt | With AWX we need some dependencies locally | 17:06 |
matburt | and some of our tests can be kind-of... heavy. | 17:06 |
pabelanger | there is also vexxhost to run some stuff too, we're using 2 regions there now but only in ansible-network tenant atm. We still need to discuss some budget this around that also | 17:06 |
pabelanger | matburt: what sort of VM requirements? | 17:06 |
matburt | We have 3x 4-core 16GB of memory systems | 17:07 |
pabelanger | matburt: okay, so jobs today need a 16GB system to run? That is an intergration test right | 17:08 |
matburt | We're also working with tristanC to turn up a k8s nodepool... we have that active now, we'll start seeing about configuring that and getting it into yalls infra | 17:08 |
matburt | Well, they could... good to have the breathing room | 17:08 |
matburt | a lot of awx development and testing runs out of containers and we'd love to run our smoke tests from k8s... this is a great opportunity to do that | 17:09 |
pabelanger | today in vexxhost we have capacity for 1vcpu/1gb and 4vcpu/4gb. We can launch more flavors, just haven't because of cost related. But have the ability to go all the way up to 64gb ram I think | 17:09 |
matburt | We have a good bit of wiggle room with what infrastructure we can bring up | 17:09 |
matburt | it's not unusual for us to bring up some extremely large instances for one-off testing | 17:10 |
pabelanger | matburt: yah, the only concern I have right now, is zuul is self hasn't landed container support yet. So this is all experimental, I know it will get landed upstream, but been in the works for some time. So, something to consider if you are hoping to base testing of that | 17:10 |
*** jpena is now known as jpena|off | 17:11 | |
pabelanger | https://review.openstack.org/#/c/560136/ | 17:11 |
matburt | We're okay with being on the vanguard of that effort | 17:11 |
matburt | We can certainly *do* our testing in containers on the static nodes... it'd be fantastic if we could do it in k8s | 17:11 |
pabelanger | right, I would suggest maybe start working on your containers on VMs. get that 100%, then tristanC and SF team land the upstream patches for k8s into zuul. I know people like bmw and godaddy are also looking for that support too | 17:13 |
pabelanger | and with fest next week, i think it a great time to give that feedback to zuul team | 17:13 |
pabelanger | talking with mnaser, vexxhost also has k8s capacity, which means we could launch the cluster their too | 17:14 |
pabelanger | even multi-region, if vexxhost supported it | 17:14 |
mnaser | i've been thinking of like | 17:18 |
mnaser | zun + kata | 17:18 |
*** zoli is now known as zoli|gone | 17:42 | |
*** sshnaidm is now known as sshnaidm|off | 18:03 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!