Wednesday, 2018-07-04

tristanC	gundalow: on another topic, we'll upgrade ansible.sf-project.io host to the latest version of software-factory soon, (the version 3.1)	00:04
tristanC	gundalow: in that version, the ansible*/zuul-config project will now host that file https://softwarefactory-project.io/cgit/config/tree/zuul/ansible_networking.yaml	00:05
gundalow	Cool. Is their an email list I should subscribe to for planed upgrades/outages notices?	00:05
tristanC	gundalow: we don't have such mailling-list yet, there shouldn't be any outages	00:06
gundalow	:)	00:06
tristanC	gundalow: though the upgrade will propose an update to the zuul-config project, so there will be a PR to accept to make the new version effective	00:06
gundalow	cool, will keep an eye out for that. Thanks for the heads up	00:07
tristanC	it seems like we could do that in a couple of weeks, one month top	00:07
sfbender	Paul Belanger created software-factory/sf-config master: Fix grapaha graph for executor memory usage https://softwarefactory-project.io/r/12862	00:39
sfbender	Paul Belanger created software-factory/sf-config master: Add executor HDD usage to zuul-status graph https://softwarefactory-project.io/r/12863	00:46
sfbender	Paul Belanger created software-factory/sf-config master: Add max_servers metric to nodepool test nodes graph https://softwarefactory-project.io/r/12864	01:03
sfbender	Merged www.softwarefactory-project.io master: Add 3.0 release note for new sf-config and acme-tiny version https://softwarefactory-project.io/r/12826	01:06
tristanC	logan-: i published a new sf-config and acme-tiny package in the 3.0 release repository. This should fix the bug you reported, thanks for the feedback! ( release note is: http://www.softwarefactory-project.io/releases/3.0/ )	01:14
*** caphrim007 has joined #softwarefactory		01:21
*** caphrim007 has quit IRC		01:26
*** caphrim007 has joined #softwarefactory		01:35
*** Guest38444 has quit IRC		02:04
*** Guest38444 has joined #softwarefactory		02:08
sfbender	Tristan de Cacqueray created software-factory/sf-config master: zuul: install missing packages for config-check https://softwarefactory-project.io/r/12865	02:53
*** caphrim007_ has joined #softwarefactory		03:01
*** caphrim007 has quit IRC		03:04
sfbender	Tristan de Cacqueray created software-factory/sf-ci master: Switch back to base job since log-classify is now integrated https://softwarefactory-project.io/r/12866	03:04
logan-	awesome tristanC, thanks for the follow up. i'm interested to deploy 3.1 and try out a private config repo. is this the process I should be looking at using zuul_rpm_build.py? https://softwarefactory-project.io/docs/contributor/prepare_dev_environment.html	03:32
tristanC	logan-: you could give the current 3.1 candidate a try by running this task: https://softwarefactory-project.io/paste/show/1128/	03:36
tristanC	then continue the update process as documented here: https://softwarefactory-project.io/docs/operator/upgrade.html	03:36
tristanC	e.g. yum update sf-config && sfconfig --upgrade	03:36
logan-	thanks!	03:37
tristanC	though note that private config repo has not been tested, so there are still probably some issues with it	03:38
tristanC	for example, we need a toggle to restrict the default acl here: https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-repos/templates/config/resources/_internal.yaml.j2#n59	03:39
tristanC	and this task also needs to be udpated: https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-repos/tasks/fetch_config_repo.yml#n5	03:39
tristanC	(the current process is to fetch the config repo on every hosts to apply new config, and this assume public access to the repo)	03:39
tristanC	so to enable a private config repo, we'll have to setup the access key on every host managed by sfconfig	03:40
tristanC	or we could change the logic and push the config repo content from the install-server instead of pulling	03:40
logan-	yeah, similar to how prepare-workspace pushes the repos	03:41
tristanC	basically, any task using config_public_location needs to be fixed	03:43
tristanC	logan-: also, even if we support private config repo (e.g. in gerrit), zuul may still leaks its content, e.g. config-check and config-update job logs will be visible in zuul status page and builds history	03:45
logan-	good point	03:46
tristanC	logan-: that can also be parametrized, e.g. if the private config option (TBD) is set, then we could make the task no_log and keep the artifacts locally on the executor	03:56
tristanC	feel free to try the 3.1 candidate version though, it still adds many new great features :-)	03:57
logan-	will do!	04:04
sfbender	Tristan de Cacqueray created software-factory/sf-config master: nodepool: fix dib cache location https://softwarefactory-project.io/r/12868	06:08
*** nchakrab has joined #softwarefactory		06:13
sfbender	Tristan de Cacqueray created software-factory/sf-docs master: Add log-classify user documentation https://softwarefactory-project.io/r/12869	06:27
sfbender	Tristan de Cacqueray created logreduce master: Fix ARA report directory link to ara-report https://softwarefactory-project.io/r/12870	06:36
sfbender	Tristan de Cacqueray created logreduce master: Update zuul-jobs log-classify role https://softwarefactory-project.io/r/12871	06:36
sfbender	Merged logreduce master: Fix ARA report directory link to ara-report https://softwarefactory-project.io/r/12870	06:38
sfbender	Merged logreduce master: Update zuul-jobs log-classify role https://softwarefactory-project.io/r/12871	06:40
sfbender	Merged www.softwarefactory-project.io master: Add sprint 2018-26 https://softwarefactory-project.io/r/12804	07:01
*** Guest38444 has quit IRC		07:28
*** Guest38444 has joined #softwarefactory		07:31
*** jpena\|off is now known as jpena		08:04
sfbender	Merged software-factory/sf-ci master: Switch back to base job since log-classify is now integrated https://softwarefactory-project.io/r/12866	09:52
*** jpena is now known as jpena\|lunch		10:59
sfbender	Merged software-factory/sf-config master: nodepool: fix dib cache location https://softwarefactory-project.io/r/12868	11:18
sfbender	Merged software-factory/sf-config master: Fix grapaha graph for executor memory usage https://softwarefactory-project.io/r/12862	11:28
sfbender	Merged software-factory/sf-config master: zuul: install missing packages for config-check https://softwarefactory-project.io/r/12865	11:35
sfbender	Merged software-factory/cauth master: cauth/repoxplorer: Harden in case of repoxplorer or elasticsearch down https://softwarefactory-project.io/r/12831	11:51
*** Guest38444 has quit IRC		12:01
sfbender	Fabien Boucher created software-factory/managesf master: managesf/configuration/repoxplorer: Fix in case tenant does not have default-connection https://softwarefactory-project.io/r/12878	12:02
*** Guest38444 has joined #softwarefactory		12:10
*** jpena\|lunch is now known as jpena		12:16
sfbender	Fabien Boucher created software-factory/managesf master: managesf/configuration: handle the private attribute https://softwarefactory-project.io/r/12879	12:31
rcarrillocruz	folks, any issues with the oci server	12:42
rcarrillocruz	seeing a lot of node job reschedules	12:42
rcarrillocruz	just got a retry limit	12:42
tristanC	rcarrillocruz: yes, though it doesn't seems related to oci, other jobs are also failing with retry limit with dib nodeset	12:44
rcarrillocruz	oki	12:44
tristanC	rcarrillocruz: i have yet found the bottleneck, i'll have a look tomorrow	12:46
tristanC	we are migrating rdoproject.org jobs over to sf-project.io zuul, this may be causing scaling issue between zuul and nodepool, or maybe executor are overloaded	12:47
tristanC	e.g.: https://softwarefactory-project.io/grafana/d/000000001/zuul-status?panelId=43&fullscreen&orgId=1&from=now%2FM&to=now	12:47
tristanC	pabelanger: that graph seems a bit odd https://softwarefactory-project.io/grafana/d/000000001/zuul-status?panelId=44&fullscreen&orgId=1&from=now%2FM&to=now, shouldn't the executor load be lower?	12:49
tristanC	they only have 4cpu each	12:50
tristanC	pabelanger: symptoms are jobs take a long time to start, and sometime bail out with 'retry_limit'	12:53
*** nchakrab_ has joined #softwarefactory		12:57
sfbender	Fabien Boucher created software-factory/sf-config master: cgit and hound config: take care of the private attribute https://softwarefactory-project.io/r/12880	12:57
*** nchakrab has quit IRC		13:00
tristanC	pabelanger: zuul.conf currently uses load_multiplier=2.5, i think we could lower this to 2 or even 1.5	13:06
pabelanger	tristanC: if you look at executor at https://softwarefactory-project.io/grafana/d/000000001/zuul-status?orgId=1 it will show if they are or are not accepting jobs, if not accepting jobs, no builds will start	13:52
pabelanger	that is likely because of governor	13:52
pabelanger	tristanC: starting build graph looks to be good too	13:53
pabelanger	rcarrillocruz: have a log?	13:58
tristanC	pabelanger: e.g. PS5 of https://softwarefactory-project.io/r/#/c/12763/	13:58
tristanC	pabelanger: i wonder if executors may accept job but then fail to start the build. sometime on the status page, console logs just stop with END OF STREAM, e.G.: https://softwarefactory-project.io/zuul/t/rdoproject.org/stream.html?uuid=934eac35fa994e049ed78484318e57fd&logfile=console.log	14:00
pabelanger	tristanC: so, i don't have proof yet, but I think we are seeing poor IO on the zuul-executor, which could be taking too long to do merge operations before running the playbook, if that fails, i believe the job will be rescheduled by scheduler	14:01
pabelanger	tristanC: can you send a copy an executor jobs for the jobs above^	14:02
tristanC	or that yes	14:02
pabelanger	we should be able to see a timeout in logs	14:02
pabelanger	tristanC: do we have SSDs in these compute nodes? I believe we should look to mount /var/lib/zuul with SSD, not ceph to help get better IO	14:03
pabelanger	local disk vs network disk	14:03
tristanC	pabelanger: indeed WARNING zuul.AnsibleJob: [build: a24ebac1a2744059b7692512e36405d5] Ansible timeout exceeded	14:04
pabelanger	tristanC: was that pre-run?	14:05
tristanC	pabelanger: here is the log of the retry_limit rcarrillocruz got: https://ansible.softwarefactory-project.io/paste/show/piAZSihmXcQixLtnFb2z/	14:09
tristanC	pabelanger: and here is a similar failure happening with a dib nodeset: https://softwarefactory-project.io/paste/show/Q5Jb0maoKsnMf2A2wBrQ/	14:11
pabelanger	Hmm, that timeout looks to be short	14:13
pabelanger	and we don't seem to log it	14:13
pabelanger	tristanC: -9 is abort	14:16
pabelanger	tristanC: so zuul aborted the run for some reason	14:16
pabelanger	tristanC: new patchset?	14:16
*** nchakrab_ has quit IRC		14:17
pabelanger	I don't think it is new hdd sensor, it should only stop jobs from running, not abort them	14:17
tristanC	pabelanger: iirc those were reported as retry_limit, and there is a warning about about ansible timeout	14:20
pabelanger	tristanC: did zuul-executor get restarted during that time?	14:20
pabelanger	tristanC: scheduler log should give more info to retries too	14:21
*** nchakrab has joined #softwarefactory		14:21
tristanC	pabelanger: scheduler logs for the second build is https://softwarefactory-project.io/paste/show/HmAeuHC0Xb2tRjYxZTkG/	14:23
tristanC	pabelanger: first build is https://ansible.softwarefactory-project.io/paste/show/gx6lceANyJ8964dwJeoe/	14:23
tristanC	i got to go now, i'll debug more tomorrow	14:24
*** nchakrab has quit IRC		14:40
rcarrillocruz	folks, i need to debug a weird issue on vyos_config, within the context of a zuul job run	15:53
rcarrillocruz	how can help me out to do an autohold and inject my pubkey	15:53
pabelanger	fbo: ^	15:54
pabelanger	rcarrillocruz: sorry, I don't have access myself	15:54
rcarrillocruz	ah, nhicher is not around	15:54
rcarrillocruz	:/	15:54
fbo	rcarrillocruz: yep	15:55
pabelanger	rcarrillocruz: I think on PTO	15:55
rcarrillocruz	fbo: https://github.com/ansible-network/cloud-vpn/pull/3	15:55
rcarrillocruz	let me know when i push a new patchset	15:56
rcarrillocruz	so the hold is made	15:56
rcarrillocruz	my keys: https://github.com/rcarrillocruz.keys	15:56
rcarrillocruz	or a recheck rather, don't really have anything to change on the PR	15:57
fbo	looks like I need a job name	15:58
rcarrillocruz	cloud-vpn-aws-vyos-to-aws-vpn	15:58
fbo	rcarrillocruz: ^	15:58
fbo	ok	15:58
fbo	rcarrillocruz: ok let's recheck your change	15:59
rcarrillocruz	done	15:59
fbo	rcarrillocruz: the link to your pub key ?	16:01
rcarrillocruz	any from the link i pasted earlier	16:02
rcarrillocruz	https://github.com/rcarrillocruz.keys	16:02
fbo	thanks	16:03
rcarrillocruz	what's the IP, i don't think that's logged on the job log	16:04
rcarrillocruz	or wait, i think i can get it on the nodes dashboard	16:05
rcarrillocruz	bah, not	16:06
fbo	rcarrillocruz: zuul@38.145.33.133	16:07
rcarrillocruz	thx mate	16:07
rcarrillocruz	where is the workspace put these days	16:08
rcarrillocruz	[zuul@host-10-0-0-11 ~]$ pwd	16:08
rcarrillocruz	/home/zuul	16:08
rcarrillocruz	[zuul@host-10-0-0-11 ~]$ ls	16:08
rcarrillocruz	wait	16:09
rcarrillocruz	i think you need to put the key on zuul-worker user	16:09
fbo	Oh but I was unable to connect with zuul-worker, and that's zuul that was defined in the nodepool config	16:10
fbo	for that image	16:10
rcarrillocruz	thing is the workapce (per the job def) is checked out on zuul-worker home folder	16:10
rcarrillocruz	this is odd	16:11
rcarrillocruz	[zuul@host-10-0-0-11 ~]$ cd /home	16:11
rcarrillocruz	[zuul@host-10-0-0-11 home]$ ls	16:11
fbo	rcarrillocruz: you can sudo -i isn't it ?	16:11
rcarrillocruz	zuul	16:11
rcarrillocruz	i can sudo, but i don't see the zuul-worker home folder anywhere	16:11
fbo	same here.	16:12
fbo	Is the image correct ? I mean I did nothing specific, just login in it	16:12
rcarrillocruz	well, the image is a f27-oci	16:12
rcarrillocruz	that's not mmanaged by me	16:12
rcarrillocruz	i think you may have give me access to a node that is not part of the job	16:13
fbo	ah so that's not the right image then	16:13
rcarrillocruz	https://github.com/ansible-network/zuul-config/blob/master/zuul.d/jobs.yaml	16:13
rcarrillocruz	what i need to get is access to the node that is running the job (still)	16:13
rcarrillocruz	if you did the autohold, it should not be deleted by nodepool, right after the job ends	16:14
rcarrillocruz	ok	16:15
rcarrillocruz	so the node is	16:15
rcarrillocruz	0000046738	16:15
rcarrillocruz	per https://ansible.softwarefactory-project.io/zuul/nodes.html	16:15
rcarrillocruz	what's the IP of that node	16:16
rcarrillocruz	nodepool list should show it	16:16
fbo	rcarrillocruz: this is a container not sure I can give you access then	16:16
fbo	for the autohold there isn't option for specifying the image so the node on hold should be the right one	16:16
rcarrillocruz	i would assume containers do run a ssh daemon, and they don't have another access mechanism ?	16:16
rcarrillocruz	i mean, the zuul executor connects to the node	16:17
rcarrillocruz	i'd be surprised if the container is accessed by zuul executor by connecting to host, then doing something like a primitive docker exec or the likes	16:17
rcarrillocruz	if you run in nodepool	16:18
rcarrillocruz	nodepool list \|grep 0000046738	16:18
rcarrillocruz	what does it show	16:18
fbo	rcarrillocruz: ok so let's try zuul-worker@38.145.33.82	16:19
fbo	try port 34999	16:20
fbo	that's the one specified on nodepool list --detail	16:21
rcarrillocruz	can ssh to 22, cannot to 34999	16:21
rcarrillocruz	otoh, in but there's no checked out project, i assume it got deleted maybe, dunno	16:22
rcarrillocruz	will try to recreate from this	16:22
fbo	rcarrillocruz: well that's note the way to do it. I removed your from there as this was the main oci node.	16:27
rcarrillocruz	so that's the host?	16:27
rcarrillocruz	you know what	16:28
rcarrillocruz	i'll change the node type	16:28
rcarrillocruz	and try to recreate from a real fedora node	16:28
rcarrillocruz	pabelanger: did you created any fedora dib nodes on our tenant?	16:29
rcarrillocruz	i was off last week, unsure what you did there	16:29
fbo	I succeed to connect on 34999 and was in the container	16:29
rcarrillocruz	let me retry then	16:29
rcarrillocruz	[ricky@ricky-laptop ~]$ ssh zuul-worker@38.145.33.82 -p 34999	16:30
rcarrillocruz	Received disconnect from 38.145.33.82 port 34999:2: Too many authentication failures	16:30
rcarrillocruz	Authentication failed.	16:30
rcarrillocruz	you may need to put my pubkey within the container zuul-worker auth keys	16:30
fbo	oh ok you had network access to the port, cool	16:30
rcarrillocruz	so the way it works apparently	16:31
rcarrillocruz	22 is for the host	16:31
rcarrillocruz	then each container	16:31
rcarrillocruz	it spawns an sshd process on 34999	16:31
fbo	retry	16:31
rcarrillocruz	just so zuul-executor can connect to it	16:31
rcarrillocruz	oci slaves that it	16:31
rcarrillocruz	i'm in now	16:31
rcarrillocruz	\o/	16:31
rcarrillocruz	and the change is checked out there	16:31
rcarrillocruz	thx fbo, i can debug now	16:32
fbo	rcarrillocruz: sorry for the time to figure out how to do that.	16:32
rcarrillocruz	hey, you solved it ;-)	16:33
*** jpena is now known as jpena\|off		17:03
*** fbo is now known as fbo\|off		17:14
sfbender	Fabien Boucher created software-factory/managesf master: wip - managesf/resources: add extra validation for the private attribute https://softwarefactory-project.io/r/12883	17:31
*** Guest38444 has quit IRC		18:43
*** Guest38444 has joined #softwarefactory		18:46
*** caphrim007_ has quit IRC		19:00
*** caphrim007 has joined #softwarefactory		19:01
sfbender	Merged software-factory/sf-config master: zuul: integrate log-classify post actions https://softwarefactory-project.io/r/12763	20:08
gundalow	Created a new branch `stable-2.5` and I've protected the branch in GitHub, though Zuul doesn't seem to be running: https://github.com/ansible-network/network-engine/pull/107 I can't see anything in the dashboard	22:34
*** Guest38444 has quit IRC		22:58
*** Guest38444 has joined #softwarefactory		23:13

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!