Monday, 2020-05-04

*** dpawlik has joined #softwarefactory		06:04
*** dpawlik has quit IRC		06:04
*** dpawlik has joined #softwarefactory		06:07
*** dpawlik has quit IRC		06:07
*** dpawlik has joined #softwarefactory		06:08
*** apevec has joined #softwarefactory		06:43
*** sshnaidm\|off is now known as sshnaidm		07:33
*** jpena\|off is now known as jpena		07:49
*** brendangalloway has joined #softwarefactory		11:29
*** jpena is now known as jpena\|lunch		11:30
*** rfolco has joined #softwarefactory		12:04
brendangalloway	tristanC: Last week I had a question about the ara-report folder not being being visible. Is there a config setting I can change to make it visible again? Our devs don't like the change to their workflow	12:13
brendangalloway	I'm also noticing an issue where the post-config job does not appear to be updating the nodepool.yaml file correctly. The playbook runs '/bin/managesf-configuration nodepool --output "/etc/nodepool/nodepool.yaml" --extra-launcher --hostname runc' which generates a file with an empty providers entry. If I manually run the utility without the	12:15
brendangalloway	extra launcher the file appears to be generated correctly	12:15
tristanC	the ara-report folder is no longer visible but it is still available. Either by appending `/ara-report/` to the log_url, either by clicking the `Ara Report` link from the build result	12:15
tristanC	brendangalloway: that post-config issue rings a bell, let me check	12:16
brendangalloway	We found you could type the url back in manually, but having the link in the folder was a lot more convenient when debugging. If it's not possible we'll live, but it would be preferred if we could restore the previous behaviour somehow	12:19
tristanC	brendangalloway: that's unfortunate. This is happening because we switched the ara-api to be a dedicated service so that it could run the python3 version (previously it was running under mod_wsgi in apache, which meant it has to be py2 on centos)	12:21
tristanC	brendangalloway: and we couldn't find a way to instruct apache to perform a rewrite rule while keeping the folder available in the generated index	12:22
brendangalloway	And lastly, I'm trying to set up a kubernetes cluster in preparation for runc being deprecated in the next release. The documentation on what needs to be done is a bit scattered though and I'm struggling to piece together exactly what has to be done. Does adding the hypervisor-k1s role to arch.yaml set up a kubernetes cluster on the specified	12:22
brendangalloway	node, or just install the tools needed for nodepool to talk to the cluster defined the kube_file in sfconfig.yaml?	12:22
brendangalloway	ok, that is unfortunate. If there is some way to restore the link in the future it would be appreciated	12:24
tristanC	brendangalloway: if you can setup a kubernetes and provide the kube_file that would be the best	12:24
tristanC	brendangalloway: otherwise, using the k1s component will setup a fake kubernetes endpoint that will work for nodepool/zuul workload, and it will work similarly than k1s, e.g. nodepool will be auto configured and there will be a _managed_k1s provider added to the config repo	12:25
tristanC	ftr, the code is currently available here: https://pagure.io/software-factory/k1s	12:25
brendangalloway	Thanks. I have deployed a kubeadm cluster on another network in our openstack cluster and copied across the admin config file, updated sfconfig.yaml and run sfconfig --no-install	12:27
brendangalloway	The operate nodepool docs refer to the provider defined in _local_hypervisor_k1s.yaml, but that only gets created when using the hypervisor-k1s role? Do I still need to defined a provider, or can I simply refer to the one defined in the kube_file? If the latter, how do I do so	12:29
tristanC	brendangalloway: well the migration from runc to kubernetes remains to be defined and documented :)	12:30
tristanC	brendangalloway: until then, you can setup a custom (not managed by sfconfig) provider, like this one: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/tests/fixtures/kubernetes.yaml#L10-L24	12:31
tristanC	the `context` attribute should match a context from the kube_file you provided	12:31
*** jpena\|lunch is now known as jpena		12:31
brendangalloway	I'm not planning on migrating the existing runc jobs just yet (I see there's a stream of patches that are required to do so), but do want to see if we can use the kube ourselves before then.	12:33
tristanC	brendangalloway: are you using the opendev.org/zuul/zuul-jobs project?	12:35
brendangalloway	I think so - we're using the zuul-jobs that were provided as part of the software-factory install	12:38
tristanC	brendangalloway: ok good, (we are still waiting for some role to be accepted upstream to help with replacing runc with kubectl), and iirc they are provided by the zuul-jobs copy provided by the software-factory install	12:40
brendangalloway	ok, I will not try migrate to kubectl just yet.	12:41
brendangalloway	Ok, thanks for all the help. I see the k1s driver providers mechanisms to store dockerfile definitions of images in the config repo itself. Is there a similar mechanism for managing external kubernetes clusters in the CI definitions, or would we have to have any custom containers defined in a repo and just refer to them in the provider	12:47
brendangalloway	definition?	12:47
tristanC	brendangalloway: about the post-config job failing to produce a valid nodepool configuration, i can't find a fix. But looking at the code, i think there maybe was an issue with the ansible fact cache mechanism	12:47
tristanC	brendangalloway: in particular, the `--extra-launcher` argument is added if `ansible_hostname` is not the `first_launcher` ( from https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-nodepool/tasks/update.yml#n26 )	12:48
brendangalloway	and are there any config requirements for dockers to be zuul workers? I see the centos-7 dockerfile example provided for k1s performs the equivalent of the zuul-worker dib element. Do we have to do something similar for external pods?	12:49
tristanC	brendangalloway: and first-launcher is set to be the `name` of the host in the arch file ( from https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/sfconfig/arch.py#n89 )	12:49
tristanC	brendangalloway: thus could you share the output of `grep ^first_launcher /var/lib/software-factory/ansible/group_vars/all.yaml` and `ansible -m setup runc \| grep ansible_hostname`	12:51
tristanC	and perhaps dropping the file in `/var/lib/software-factory/ansible/facts/` would help fix that issue?	12:52
brendangalloway	I'm guessing this is a problem:	12:52
brendangalloway	ansible -m setup runc \| grep ansible_hostname [WARNING]: Could not match supplied host pattern, ignoring: runc [WARNING]: No hosts matched, nothing to do	12:52
brendangalloway	first launcher is 'main'	12:53
tristanC	brendangalloway: about managing custom container image, this is currently specifics to k1s. We are working on a generalized solution named `zuul-images-jobs`, but that is a lot of work	12:53
tristanC	brendangalloway: i would recommend you manage the image manually at first	12:54
tristanC	e.g. either publish them to a public registry, or push them directly to the host running the kubernetes cluster	12:54
brendangalloway	tristanc: ok, and we would need to prep the image with the zuul-worker steps? We have a private docker repo on site for our containerised components	12:55
brendangalloway	by dropping the file I should delete the runc file in that folder?	12:56
tristanC	brendangalloway: it depends on your job, but if you use the upstream jobs like `tox`, there are some run statement that help with that, in particular: https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-repos/files/config/containers/centos-7/Dockerfile#n29	12:57
tristanC	and installing tool like python3-devel, rsync and such	12:57
tristanC	otherwise any image (with at least linux-util or busybox and python) should work	12:58
brendangalloway	Don't all worker nodes require a zuul login, zuul-minimum packages and sudo permission for the executor to use them? We've encountered lots of problem with static nodes not being set up in exactly the manner that zuul expects	13:00
brendangalloway	and are there public images that have already been configured as zuul workers?	13:00
tristanC	for kubernetes it is diffirent as ansible will use the `kubectl` connection plugin, e.g. it runs `kubectl exec` command, thus there is no need for a login or existing user	13:00
tristanC	but some zuul-jobs will perform a `revoke-sudo` task, and this will fail if sudo is not configured, as suggested by the two RUN statements from the `centos-7/Dockerfile#n29` above link	13:03
brendangalloway	ok, so the requirements for what are installed will depend on the job	13:03
tristanC	brendangalloway: yes. And the `centos-7/Dockerfile` should provide something equivalent of the default runc-centos label	13:04
brendangalloway	and if we want to implement any jobs that use/inherit the fetch output role, we need to use the new role as in the patches at https://review.opendev.org/#/q/topic:zuul-jobs-with-kubectl ?	13:05
tristanC	that is correct, any roles that perform synchronize are going to fail otherwise	13:07
brendangalloway	specifically synchronise to the worker?	13:08
brendangalloway	not synchronise in general, for example copying the log files across to the elk node	13:09
tristanC	brendangalloway: the roles that uses synchronize to fetch artifact file from the worker to the executor, those needs to be adapted to copy the file to the ~/zuul-output directory, and let the base job fetch the file	13:09
tristanC	brendangalloway: about the config-update job failure, i guess you were able to fix it manually in /usr/share/sf-config/ansible/roles/sf-nodepool/tasks/update.yml ?	13:10
brendangalloway	Ok - I think the only places we use that at the moment are in the runc containers	13:10
tristanC	brendangalloway: iiuc that failure, you have an host in the arch file named `main`, and another one named `runc` ?	13:10
brendangalloway	I fixed it by running the bin by hand without the --extra launcher and then restarting nodepool	13:11
brendangalloway	yes, the runc containers are on a separate host to the executor	13:11
tristanC	brendangalloway: that may be reverted once the config-update job run again, until we understand the issue and have a fix, you better remove the `extra-launcher` argument from the `sf-nodepool/tasks/update.yml` file	13:12
brendangalloway	I will do so. Now that you mention that - when running sfconfig --upgrade I had to edit the timeout command in /usr/share/sf-config/ansible/roles/sf-elasticsearch/tasks/postconf.yml +42	13:17
brendangalloway	what is the correct way to to submit bug reports or similar for issues like that?	13:18
tristanC	brendangalloway: on this page: https://tree.taiga.io/project/morucci-software-factory/issues	13:19
brendangalloway	tristanC: Thank you so much for all the help. I think that is everything I needed to know	13:22
tristanC	brendangalloway: you're welcome, thank you for the feedback!	13:29
tristanC	brendangalloway: so we are looking at the `extra-launcher` issue, and it seems like ansible may be using incorrect fact, could you check the `ansible_hostname` value in `/var/lib/software-factory/ansible/facts` and see if they are consistent? If not, i think dropping the file should be enough to prevent that failure, but we'll add some check to at least detect when this is happening	13:30
tristanC	turns out we just had a similar issue in another deployment, and the `ansible_hostname` from the fact is not the same as the hostname of the host, resulting in that incorrect `extra-launcher` argument being set	13:31
brendangalloway	would hostname vs FQDN be a bit enough difference?	13:32
brendangalloway	*big enough	13:32
brendangalloway	the facts file has the hostname as runc, the arch file has it as runc.domain	13:33
tristanC	we would like the ansible_hostname to match what is in the arch file, that is just the name, without the domain	13:33
brendangalloway	so I would need to remove the domains in the arch file?	13:34
tristanC	brendangalloway: i guess you are not running the nodepool-launcher on the runc host, thus i suspect the "main" host has a `ansible_hostname` fact that refers to runc	13:34
brendangalloway	yes nodepool launcher is on main.ci, runc containers are running on runc.ci	13:35
tristanC	brendangalloway: changing the runc host name from the arch file shouldn't be required	13:35
brendangalloway	So I must change it in the facts file?	13:36
tristanC	brendangalloway: in our case, we found that `grep ansible_hostname /var/lib/software-factory/ansible/facts/main-host.org.name` shows `nodepool-builder`, instead of `main-host`	13:36
tristanC	iirc, removing the fact file should ensure that ansible_hostname is correct, but we are looking for the why, and how to prevent that :)	13:38
brendangalloway	!!	13:41
openstack	brendangalloway: Error: "!" is not a valid command.	13:41
brendangalloway	[root@main.domain facts]# grep ansible_hostname *builder.domain: "ansible_hostname": "builder",elk.domain: "ansible_hostname": "elk",main.domain: "ansible_hostname": "runc",merger.domain: "ansible_hostname": "merger",runc.domain: "ansible_hostname": "runc",	13:42
brendangalloway	so the hostname being runc in the main file is the issue?	13:42
tristanC	brendangalloway: yes	13:42
tristanC	that confused the nodepool.yaml generation logic, resulting in an empty provider list	13:43
brendangalloway	any idea how that file ended up being wrong?	13:43
brendangalloway	I assume I should change that and revert the change to the upgrade task?	13:43
tristanC	that's the issue we are trying to understand	13:44
tristanC	once this is fixed, the upgrade task bandaid can be reverted	13:44
brendangalloway	Is there other information I can provide that would help you understand the issue?	13:45
tristanC	that's ok thank you, we are debugging an affected setup	13:45
brendangalloway	ok, let me know if I can assist	13:46
sfbender	Daniel Pawlik created software-factory/managesf master: DNM - Added external-project parameter to compute repo by hound https://softwarefactory-project.io/r/18204	14:56
sfbender	Tristan de Cacqueray created software-factory/sf-config master: sfconfig: add an update facts task https://softwarefactory-project.io/r/18205	14:56
tristanC	brendangalloway: https://softwarefactory-project.io/r/18205 seems to be a solution for the invalid fact ansible_hostname	14:57
*** dpawlik has quit IRC		15:59
*** jpena is now known as jpena\|off		17:04
*** sshnaidm is now known as sshnaidm\|afk		18:07
sfbender	Merged www.softwarefactory-project.io master: Add previous sprints summaries https://softwarefactory-project.io/r/18069	18:10
*** brendangalloway has quit IRC		19:20
*** rfolco has quit IRC		21:27
*** rfolco has joined #softwarefactory		22:03

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!