Thursday, 2020-10-15

recyclehero	jrosser: I see ur name for this all over the place. BTW I set my first debug right	00:08
recyclehero	Oct 15 03:31:40 infra1-utility-container-3e3911b0 ansible-openstack.cloud.os_project[12159]: Invoked with cloud=default state=present name=service description=Keystone Identity Service domain_id=default endpoint_type=admin validate_certs=True interface=admin wait=True timeout=180 properties={} enabled=True auth_type=None auth=NOT_LOGGING_PARAMETER region_name=None availability_zone=None	00:08
recyclehero	ca_cert=None client_cert=None client_key=NOT_LOGGING_PARAMETER api_timeout=None	00:08
recyclehero	task [OS_keystone: Add service project]	00:08
recyclehero	error: openstacksdk required	00:08
recyclehero	the log is from utillity container	00:08
recyclehero	I added a debug and saw regardless of what I set keystone_service_setup_host: "{{ groups['utility_all'][0] }}" it is the utiliity container	00:10
recyclehero	I cant proced setup-openstack	00:11
recyclehero	its my restore attempt deployment	00:11
*** gillesMo has joined #openstack-ansible		00:11
recyclehero	jrosser: I think it didnt respect my --regen switch when recreating the tokens!	00:15
*** macz_ has joined #openstack-ansible		00:15
*** rf0lc0 has joined #openstack-ansible		00:18
*** macz_ has quit IRC		00:20
*** MickyMan77 has quit IRC		00:42
*** MickyMan77 has joined #openstack-ansible		00:42
*** MickyMan77 has quit IRC		00:51
*** gyee has quit IRC		01:04
openstackgerrit	Merged openstack/openstack-ansible-galera_server stable/train: Bump galera version https://review.opendev.org/757483	01:05
*** rf0lc0 has quit IRC		01:06
*** macz_ has joined #openstack-ansible		01:09
*** macz_ has quit IRC		01:14
*** MickyMan77 has joined #openstack-ansible		01:20
*** NewJorg has quit IRC		01:28
*** MickyMan77 has quit IRC		01:29
*** cshen has joined #openstack-ansible		01:36
*** cshen has quit IRC		01:40
*** spatel has joined #openstack-ansible		01:44
*** MickyMan77 has joined #openstack-ansible		02:03
*** MickyMan77 has quit IRC		02:11
*** NewJorg has joined #openstack-ansible		02:45
*** MickyMan77 has joined #openstack-ansible		02:46
*** MickyMan77 has quit IRC		02:55
*** MickyMan77 has joined #openstack-ansible		03:35
*** MickyMan77 has quit IRC		04:19
*** MickyMan77 has joined #openstack-ansible		04:20
*** evrardjp has quit IRC		04:33
*** evrardjp has joined #openstack-ansible		04:33
*** MickyMan77 has quit IRC		04:35
*** spatel has quit IRC		04:38
*** MickyMan77 has joined #openstack-ansible		04:39
*** nurdie has quit IRC		04:43
*** MickyMan77 has quit IRC		04:48
*** MickyMan77 has joined #openstack-ansible		04:50
*** MickyMan77 has quit IRC		05:00
*** MickyMan77 has joined #openstack-ansible		05:03
*** MickyMan77 has quit IRC		05:07
*** MickyMan77 has joined #openstack-ansible		05:10
*** MickyMan77 has quit IRC		05:16
*** miloa has joined #openstack-ansible		06:02
*** jbadiapa has joined #openstack-ansible		06:38
*** andrewbonney has joined #openstack-ansible		07:01
*** MickyMan77 has joined #openstack-ansible		07:05
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_magnum master: Use openstack_service_*uri_proto vars by default https://review.opendev.org/410681	07:21
*** rpittau\|afk is now known as rpittau		07:22
jrosser	morning	07:25
*** cshen has joined #openstack-ansible		07:26
noonedeadpunk	morning	07:26
noonedeadpunk	jrosser: did you get the idea of https://review.opendev.org/#/c/758207 vs https://review.opendev.org/#/c/737221/ ?	07:27
*** yolanda__ has joined #openstack-ansible		07:27
noonedeadpunk	and what makes first one at all?	07:28
masterpe	Isn' t it beter to look at the number of cores/cpu's that are available?	07:28
*** maharg101 has joined #openstack-ansible		07:29
masterpe	to determent the number of threats?	07:29
jrosser	i am not sure i understand the second patch	07:30
jrosser	https://review.opendev.org/#/c/758207/ <- that one	07:30
noonedeadpunk	technically the second posted is yours, but I guess you don't get another one (which is same for me)	07:30
noonedeadpunk	well Adri2000 posted comment to 737221 and that's how I paid attention to it	07:31
jrosser	the sed looks wierd	07:32
noonedeadpunk	and what is `ANSIBLE_FORKS_VALUE` - have no idea	07:32
jrosser	no me neither	07:32
*** cloudnull has quit IRC		07:33
noonedeadpunk	ok, good, then it's not me not seing something obvious:) or at least not me only	07:33
jrosser	my number of 20 was kind of arbitrary	07:33
jrosser	based on the recommendations for AIO cpus really	07:34
noonedeadpunk	but kind of aio requires the way more cpus than needed for just deploy host?	07:34
*** cloudnull has joined #openstack-ansible		07:34
jrosser	but it's based on the maxium number of containers in the control plane	07:34
jrosser	yeah but it's deploy host threads though, and i'm not sure that means fully utilised CPUs	07:35
jrosser	it's number of parallel tasks	07:35
noonedeadpunk	like I have 2 cpus for some deploy host :p	07:35
jrosser	right, but if you run 20 tasks in parallel which each take N seconds on the target you dont need N cpus on the deploy host at 100% to do that?	07:36
jrosser	^ confused used of N there, sorry	07:36
jrosser	right, but if you run 20 tasks in parallel which each take N seconds on the target you dont need 20 cpus on the deploy host at 100% to do that?	07:36
noonedeadpunk	depends on how much seconds will take each task honestly	07:36
noonedeadpunk	in terms of threads and cpu cycles	07:37
jrosser	hmm well maybe we need something dynamic then	07:38
noonedeadpunk	and what really disturbes me - is amount of ssh sessions here. I know we talked about that, but we need to guarantee that we do increase them	07:38
noonedeadpunk	or deployers will	07:38
jrosser	what i saw was with AIO (8cpu 8G) it would run the tasks in several batches particularly for lots of containers on the controller	07:39
jrosser	and you could get a good speedup by increasing the forks to make it do just one batch	07:39
noonedeadpunk	and for aio we have https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/bootstrap-aio.yml#L69 which is not applied to regular deployments	07:40
noonedeadpunk	and we also need to adjust this https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/maintenance-tasks/ansible-modules.rst#user-content-ansible-forks	07:40
jrosser	tbh that feels out of date info	07:41
jrosser	but if this is difficult we can leave it, or just make an optimisation for CI	07:42
noonedeadpunk	wel, partially	07:42
Adri2000	in https://review.opendev.org/#/c/758207/ ANSIBLE_FORKS_VALUE is a placeholder in openstack-ansible.rc ... and that is replaced by the actual value	07:42
noonedeadpunk	and what is that valua?	07:42
*** tosky has joined #openstack-ansible		07:42
Adri2000	it's computed in scripts-library.sh	07:43
Adri2000	my patch assumes we don't remove https://review.opendev.org/#/c/737221/2/scripts/scripts-library.sh	07:43
noonedeadpunk	well, it doesn;t have ANSIBLE_FORKS_VALUE	07:43
noonedeadpunk	and according to http://codesearch.openstack.org/?q=ANSIBLE_FORKS_VALUE&i=nope&files=&repos= this var is introduced with that patch	07:44
Adri2000	`sed -i "s\|ANSIBLE_FORKS_VALUE\|${ANSIBLE_FORKS}\|g" /usr/local/bin/openstack-ansible.rc` < the ANSIBLE_FORKS_VALUE placeholder in openstack-ansible.rc is replaced by ${ANSIBLE_FORKS} which is ocmputed in scripts-library.sh	07:44
* jrosser confused		07:44
Adri2000	this change https://review.opendev.org/#/c/758207/3/scripts/openstack-ansible.rc makes sure the ANSIBLE_FORKS env var is defined to either a user defined ANSIBLE_FORKS env var or to ANSIBLE_FORKS_VALUE which will be an actual number (it will have been replaced by bootstrap-ansible.sh)	07:46
Adri2000	this change https://review.opendev.org/#/c/758207/3/scripts/bootstrap-ansible.sh replaces ANSIBLE_FORKS_VALUE by the value ${ANSIBLE_FORKS} which is computed in scripts-libary.sh (https://review.opendev.org/#/c/737221/2/scripts/scripts-library.sh)	07:46
noonedeadpunk	I still didn;t get why in the world we need ANSIBLE_FORKS_VALUE env var	07:47
Adri2000	there is no ANSIBLE_FORKS_VALUE env var	07:47
noonedeadpunk	and why we need to replace something that does not exist?	07:47
noonedeadpunk	well, and you introduce it here ? https://review.opendev.org/#/c/758207/3/scripts/openstack-ansible.rc	07:48
Adri2000	`export ANSIBLE_FORKS="${ANSIBLE_FORKS:-ANSIBLE_FORKS_VALUE}"` will become e.g. `export ANSIBLE_FORKS="${ANSIBLE_FORKS:-10}"` after you've run bootstrap-ansible	07:48
noonedeadpunk	by defining it as default for ANSIBLE_FORKS?	07:48
Adri2000	I define an ANSIBLE_FORKS env var	07:48
Adri2000	just look at the line before, it's the same model: `export ANSIBLE_PYTHON_INTERPRETER="${ANSIBLE_PYTHON_INTERPRETER:-OSA_ANSIBLE_PYTHON_INTERPRETER}"`	07:48
Adri2000	OSA_ANSIBLE_PYTHON_INTERPRETER is a placeholder as well	07:49
noonedeadpunk	but we have this var defined https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/bootstrap-ansible.sh#L59	07:49
jrosser	imho this is confusing because of the overloading of ANSIBLE_FORKS	07:49
Adri2000	yes	07:49
Adri2000	this https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/bootstrap-ansible.sh#L46 defines ${ANSIBLE_FORKS}	07:50
jrosser	to make this clean the only place that ANSIBLE_FORKS should be defined is in the .rc file	07:50
jrosser	and we should have a different var OSA_ANSIBLE_FORKS calculated in scripts library which becomes the default value, based on the the deploy host CPUs	07:50
noonedeadpunk	+1	07:51
Adri2000	that's fine to me. I believe my patch works as is, but I can understand you find the use of names confusing. I kind of wanted to make the patch as small as possible to fix the actual problem I had. (I had to use -f X on each openstack-ansible run to define the number of forks)	07:52
jrosser	making the code more obvious is a good thing, so if the patch is a bit bigger then thats totally OK	07:54
noonedeadpunk	another option for me would be just dropping ANSIBLE_FORKS_VALUE and related sed to it	07:54
noonedeadpunk	but I like jrosser's idea	07:54
noonedeadpunk	but actually, I think we should decide if we probably want just some static bump	07:55
noonedeadpunk	as I like this idea as well, except nuance with MaxSessions	07:55
noonedeadpunk	but considering that it wil be pretty easi to override in bashrc...	07:56
noonedeadpunk	maybe we should really just increase number of forks for aio?	07:57
jrosser	simple is good too	07:58
noonedeadpunk	but yeah, again, I with my 2 cores can really the way more threads than 2	07:58
jrosser	good question	08:02
noonedeadpunk	so I see 3 roads - leave by amount of cpus (which is not so effective considering that we probably should multiply it) and make use of ANSIBLE_FORKS, add MaxSessions bump for sshd for deploy host somehow, or just set fixed number to 10?	08:02
noonedeadpunk	well start really using ANSIBLE_FORKS will be applicable anyway	08:03
jrosser	yes, because we can't even make an AIO/CI special case without that	08:03
noonedeadpunk	so https://review.opendev.org/#/c/758207/ might be a good and backportable shot	08:04
jrosser	fwiw i am using LXDs for deploy hosts, a bunch of them on the same machine	08:04
jrosser	so they all have all the host CPUs should they need them	08:04
jrosser	yes i think https://review.opendev.org/#/c/758207/ is good, if the variable names get regularised to OSA_.... for things replaced in the rc file	08:06
noonedeadpunk	well, we actually need to bump ssh sessions just for lxc hosts	08:06
Adri2000	jrosser: will prepare and push that change right now so you can both have a look	08:08
jrosser	recyclehero: the openstacksdk is installed into the utility container venv here https://github.com/openstack/openstack-ansible/blob/master/playbooks/utility-install.yml#L176	08:09
jrosser	recyclehero: the list of things which get installed into the utility venv is here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/utility_all.yml#L75	08:10
jrosser	recyclehero: handlers run at the end of a play, but only if the task which notifies them has a status of 'changed'	08:11
openstackgerrit	Adrien Cunin proposed openstack/openstack-ansible master: Actually use ANSIBLE_FORKS in openstack-ansible.rc https://review.opendev.org/758207	08:11
*** CeeMac has joined #openstack-ansible		08:12
noonedeadpunk	I wish we dropped these seds	08:24
*** yolanda__ has quit IRC		08:24
*** yolanda__ has joined #openstack-ansible		08:24
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-lxc_hosts master: Increase amount of MaxSessions https://review.opendev.org/758364	08:25
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Increase default ansible forks from 5 to 20 https://review.opendev.org/737221	08:30
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Increase default ansible forks from 5 to 20 https://review.opendev.org/737221	08:30
* noonedeadpunk still can't understand 758207 and why we set default of ANSIBLE_FORKS to undefined OSA_ANSIBLE_FORKS ;(		08:33
noonedeadpunk	that's kind of what we're doing http://paste.openstack.org/show/799066/	08:36
recyclehero	morning	08:36
noonedeadpunk	ah ok	08:36
recyclehero	jrosser: so I should see openstacksdk in /openstack/venvs/utility-21.0.1/bin insde openstack container	08:36
noonedeadpunk	in case we have ANSIBLE_FORKS defined we will replace with actual number	08:36
recyclehero	*utility container	08:37
jrosser	openstacksdk is the name of the python package	08:37
recyclehero	for some reason it isnt present on the utility contaienr	08:38
recyclehero	or maybe a symlink isnt created	08:39
recyclehero	some of this tasks got run_once on them	08:40
recyclehero	could it be causing this	08:40
jrosser	how have you confirmed that openstacksdk is not present?	08:41
recyclehero	task [OS_keystone: Add service project] error	08:42
recyclehero	it says openstack sdk required	08:42
jrosser	you said you have some trouble with deploying the utility container	08:44
jrosser	if thats somehow not worked then the rest of the roles are not going to work	08:44
recyclehero	can I delete the container and redeloy it?	08:45
jrosser	you can do that, yes	08:47
recyclehero	the link is there openstack -> /openstack/venvs/utility-21.0.1/bin/openstack	08:47
recyclehero	but the actual bin aint	08:47
recyclehero	so delete with lxc and run utility-install	08:49
jrosser	you could try re-running the utility playbook with -e venv_rebuild=yes	08:49
recyclehero	doh, too late. msg": "Destination /var/lib/lxc/infra1_utility_container-3e3911b0/config does not exist !	08:51
recyclehero	I think I should do setup-host too	08:51
jrosser	recyclehero: there is a playbook specifically to create the containers, and you can use --limit to make it very specific which will speed things up considerably	08:56
jrosser	it's work spending some time understanding whats inside the setup-*.yml playbooks, because all of the more granular things can be called directly	08:57
recyclehero	I will go limit first to see how its works then I will check that out. thanks	08:58
jrosser	https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#destroy-and-recreate-containers	09:01
MickyMan77	which openstack-ansible version is the lastes stable for use with Centos 8 ?	09:09
jrosser	MickyMan77: that would be 21.1.0 on the ussuri branch	09:11
jrosser	noonedeadpunk: we need to fix the uwsgi role https://review.opendev.org/#/c/758108/	09:17
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Fix tempest init logic https://review.opendev.org/753393	09:26
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-galera_server master: Update galera to 10.5.6 https://review.opendev.org/742105	09:29
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-lxc_hosts master: Increase amount of MaxSessions https://review.opendev.org/758364	09:31
recyclehero	when does password setting take place? like placement service in keystone	09:31
jrosser	https://github.com/openstack/openstack-ansible-os_placement/blob/master/tasks/main.yml#L87-L115	09:33
*** Nick_A has quit IRC		09:34
openstackgerrit	Merged openstack/openstack-ansible-os_placement stable/train: Trigger service restart https://review.opendev.org/757745	09:45
openstackgerrit	Merged openstack/openstack-ansible-os_cinder stable/ussuri: Trigger uwsgi restart https://review.opendev.org/757712	10:02
openstackgerrit	Erik Berg proposed openstack/openstack-ansible stable/ussuri: WIP/DNM: Upgrade ceph to octopus during run-upgrade.sh to ussuri https://review.opendev.org/758382	10:07
*** admin0 has quit IRC		10:27
*** yolanda__ is now known as yolanda		10:52
recyclehero	aguys how do u deploy with out getting logged out as an effect of security_hardening of hosts? nohup, output redirection	11:08
recyclehero	I deploy from infra1	11:11
openstackgerrit	Merged openstack/ansible-role-uwsgi master: Add vars file for ubuntu bionic https://review.opendev.org/758108	11:11
recyclehero	when It takes a while for example on wheel builds I get logged out eitehr on ssh or localhost login	11:13
*** dave-mccowan has joined #openstack-ansible		11:16
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Fix infra jobs https://review.opendev.org/758399	11:20
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server master: Update galera to 10.5.6 https://review.opendev.org/742105	11:21
*** yann-kaelig has joined #openstack-ansible		11:32
ebbex	recyclehero: i deploy from a vm with screen, so i'm not sure, but you can set "security_rhel7_session_timeout: 0" in user_variables.yml, and "ServerAliveInterval 300" in your .ssh/config	11:33
openstackgerrit	Merged openstack/openstack-ansible-os_nova master: Enable notifications when Designate is enabled https://review.opendev.org/757904	11:37
openstackgerrit	Merged openstack/openstack-ansible-os_nova stable/ussuri: Simplify scheduler filter additions https://review.opendev.org/757858	11:37
openstackgerrit	Erik Berg proposed openstack/openstack-ansible-os_nova stable/train: Simplify scheduler filter additions https://review.opendev.org/758404	11:43
recyclehero	ebbex: thanks, I changed the ServerAliveInterval.	11:46
recyclehero	but it gets overwritten by ansible!	11:53
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_nova stable/ussuri: Remove backported release note https://review.opendev.org/758406	11:54
jrosser	recyclehero: ebbex uses a seperate deploy host " i deploy from a vm" so i beleive he was referring to that host, not infra1	11:55
recyclehero	it actually didnt overwrite ServerAliveInterval in sshd_config but setting it to 0 and still got disconnected. I should look for something like session timeout in playbook and reverse it. it kicks me out in the middle of deployment	11:57
jrosser	if you are deploying from one of the target hosts then you'll need to set variables that change what the hardening role does	11:57
ebbex	recyclehero: ServerAliveInterval 300 in .ssh/config on your local machine. The one you ssh from into infra1.	11:58
jrosser	ebbex: i think infra1 is the deploy host here	11:58
ebbex	yeah, and he's getting disconnected from infra1 right?	11:59
jrosser	won't the session timeout always win?	11:59
ebbex	session timeout will win yes.	12:00
jrosser	i think your advice to adjust security_rhel7_session_timeout is whats needed	12:00
ebbex	cause he's deploying from a host that gets deployed to?	12:00
jrosser	either way i think?	12:01
recyclehero	jrosser: yes, but I am on debian.	12:01
recyclehero	yes its common as I dont see anything distro specefic in the roel	12:02
ebbex	he probably needs both. one to prevent session timeout on infra1, and two to prevent ssh disconnects from infra1 to his computer.	12:02
openstackgerrit	Georgina Shippey proposed openstack/ansible-role-systemd_service master: Greater flexibility to timer templating https://review.opendev.org/758408	12:05
openstackgerrit	Adrien Cunin proposed openstack/openstack-ansible-os_nova stable/ussuri: Enable notifications when Designate is enabled https://review.opendev.org/758411	12:09
recyclehero	its well hardened! declare -rx TMOUT="600" cant easiliy override it (read-only) . I am going cuckoo :D	12:09
openstackgerrit	Adrien Cunin proposed openstack/openstack-ansible-os_nova stable/train: Enable notifications when Designate is enabled https://review.opendev.org/758412	12:09
*** cshen has quit IRC		12:09
openstackgerrit	Adrien Cunin proposed openstack/openstack-ansible-os_nova stable/stein: Enable notifications when Designate is enabled https://review.opendev.org/758413	12:09
recyclehero	at list now I know I shoud press enter every 10 minutes.	12:11
ebbex	that's what setting security_rhel7_session_timeout is for, no?	12:12
ebbex	recyclehero: ^	12:12
*** rf0lc0 has joined #openstack-ansible		12:15
openstackgerrit	Erik Berg proposed openstack/openstack-ansible-os_nova stable/train: Simplify scheduler filter additions https://review.opendev.org/758404	12:17
recyclehero	ebbex: I wanted to do that in palce. I am on it now	12:22
jrosser	noonedeadpunk: i think for stable branch octavia jobs we are really stuck like this https://review.opendev.org/#/c/672556/	12:31
jrosser	i don't think there is a suitable amphora image for that openstack release	12:31
noonedeadpunk	jrosser: let's just disable then tests?	12:34
jrosser	yep - that would do it, the patch makes sense otherwise	12:35
noonedeadpunk	btw, any thoughts about https://review.opendev.org/#/c/752059/ ?	12:35
jrosser	yeah - i think we are having issues similar	12:35
jrosser	just not got round to check	12:36
noonedeadpunk	got it	12:36
noonedeadpunk	(just was pinged in related bug report about in what state it is)	12:36
jrosser	i'll see if we can test it but going to be not immediate	12:37
jrosser	we try to collect logs with journalbeat only on the hosts, and i think some were missing	12:40
jrosser	which could be broken mounts	12:40
*** macz_ has joined #openstack-ansible		12:40
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Fix tempest init logic https://review.opendev.org/753393	12:43
*** macz_ has quit IRC		12:45
*** cshen has joined #openstack-ansible		12:57
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-os_octavia stable/rocky: Save iptables rules for all Debian derivative operating systems https://review.opendev.org/672556	13:04
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-lxc_hosts master: Increase amount of MaxSessions https://review.opendev.org/758364	13:05
MickyMan77	Finally, :), I have a working openstack farm except for the network part. I can see dhcp request from the instances on the br-vlan interface when i do a tcpdump.	13:09
MickyMan77	The instances does not get any ip address on the nic. ----> http://paste.openstack.org/show/799080/	13:09
jrosser	MickyMan77: there are come quite comprehensive checklists here https://docs.openstack.org/openstack-ansible/latest/admin/troubleshooting.html	13:22
openstackgerrit	Merged openstack/openstack-ansible-os_nova stable/ussuri: Remove backported release note https://review.opendev.org/758406	13:25
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-os_ironic master: Updated from OpenStack Ansible Tests https://review.opendev.org/755536	13:30
*** nurdie has joined #openstack-ansible		13:36
openstackgerrit	Merged openstack/openstack-ansible-lxc_hosts stable/train: copy the actual keyring https://review.opendev.org/731626	13:37
*** sshnaidm has quit IRC		13:54
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible master: Switch from ansible-base + collections to ansible package https://review.opendev.org/758431	13:59
*** sshnaidm has joined #openstack-ansible		14:00
jrosser	goodness me how can ansible galaxy be so unreliable	14:21
jrosser	like 80% of our jobs are failing	14:21
*** macz_ has joined #openstack-ansible		14:28
*** gshippey has joined #openstack-ansible		14:29
* noonedeadpunk in ansible contributor summit and going to rent about that		14:32
noonedeadpunk	*rant	14:32
*** macz_ has quit IRC		14:33
noonedeadpunk	jrosser: do you have bug report you've written to somewhere handy?	14:34
jrosser	hmm just need to find it!	14:36
jrosser	noonedeadpunk: https://github.com/ansible/galaxy/issues/2302	14:37
noonedeadpunk	thanks!	14:37
*** rgogunskiy has joined #openstack-ansible		14:44
*** miloa has quit IRC		14:53
*** macz_ has joined #openstack-ansible		15:01
Adri2000	would be happy to get a few more opinions on https://review.opendev.org/#/c/729533/ - I think the topic basically boils down to how we define "data" in the context of OSA LXC containers. I have always assumed that OSA LXC containers' "data" were the directories bind mounted from /openstack/... into the LXC containers, such as /var/lib/mysql/ for galera containers. which means that there	15:03
Adri2000	is no actual "data" for most containers (galera being the main exception); i.e. it's possible to completely destroy/delete and then recreate from scratch (well as long as the OSA inventory is there) most of the containers.	15:03
openstackgerrit	Merged openstack/openstack-ansible master: Remove glance-registry from docs https://review.opendev.org/739794	15:11
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable https://review.opendev.org/758454	15:22
jrosser	^ nice :)	15:23
jrosser	noonedeadpunk: i was also thinking about the job failures we get downloading upper-constraints.txt, we do that a lot on each run	15:24
jrosser	that could be recovered from the requirements git repo on the CI node and maybe we put it in /openstack/requirements/<sha>/upper-constraints.txt or something	15:26
jrosser	then change the url to file://....	15:26
noonedeadpunk	well, I think we can set requirements as required project?	15:26
noonedeadpunk	and then zuul will get it - the only thing we need is to update single variabe in ci?	15:27
jrosser	right - but we set a specific sha and we'd need a specific step to extract just the right version of the file	15:27
noonedeadpunk	well, yes...	15:27
jrosser	and putting it in /openstack/... makes it also be inside all the lxc :)	15:27
jrosser	it's nowhere near as bad as the galaxy thing but maybe the next most frequent thing that breaks	15:28
recyclehero	this python-venv behave very strange. sometimes on upgrades to the version ti wants retry for 5 time even after some failed runs which should have them locally.	15:30
recyclehero	and now this	15:30
recyclehero	fatal: [infra1_horizon_container-9cb968e5 -> 172.29.239.138]: FAILED! => {"changed": false, "msg": "file not found: /var/www/repo/os-releases/21.0.1/horizon-21.0.1-constraints.txt"}	15:30
recyclehero	this is the task: but I think if I rerun it disappear	15:31
recyclehero	TASK [python_venv_build : Slurp up the constraints file for later re-deployment] ****	15:31
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable https://review.opendev.org/758454	15:32
*** gyee has joined #openstack-ansible		15:39
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable https://review.opendev.org/758454	15:44
*** rpittau is now known as rpittau\|afk		15:52
*** dave-mccowan has quit IRC		15:53
*** dave-mccowan has joined #openstack-ansible		15:57
recyclehero	I am blaming the https connection to releases.openstack.org	15:57
recyclehero	what is the varibale to make it try like crazy rather than aborting the deployemnt after 5 tries?	15:59
jrosser	there has been an outage at releases.openstack.org this afternoon	16:03
jrosser	it should be back now	16:04
recyclehero	not very stable	16:04
recyclehero	but this one is a sticky TASK [python_venv_build : Slurp up the constraints file for later re-deployment]	16:04
recyclehero	atal: [infra1_horizon_container-9cb968e5 -> 172.29.239.138]: FAILED! => {"changed": false, "msg": "file not found: /var/www/repo/os-releases/21.0.1/horizon-21.0.1-constraints.txt"}	16:04
recyclehero	waht should I do?	16:05
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable https://review.opendev.org/758454	16:07
recyclehero	I even dont know what Slurp means	16:09
recyclehero	Bit should be an address like this? https://opendev.com/openstack/requirements/raw/21.0.1/horizon-21.0.1-constraints.txt	16:18
recyclehero	its not correct, I am tring to put it in there manually	16:19
*** spatel has joined #openstack-ansible		16:21
*** MickyMan77 has quit IRC		16:25
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Fix upgrade jobs for bind-to-mgmt https://review.opendev.org/758461	16:25
recyclehero	ignore_errors: yes :(	16:29
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server stable/stein: Bump galera version https://review.opendev.org/758462	16:30
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_client stable/stein: Bump galera version https://review.opendev.org/758464	16:31
jamesdenton	jrosser with your haproxy and baremetal efforts, have you seen a need to override openstack_service_bind_address for uwsgi-based services?	16:35
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_magnum master: Fix linter errors https://review.opendev.org/755569	16:55
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Fix upgrade jobs for bind-to-mgmt https://review.opendev.org/758461	16:58
jrosser	jamesdenton: no i've not - if you're having to override that maybe we have a wrong default somewhere	17:02
jrosser	do you have an example?	17:02
jamesdenton	well, i have haproxy running on the same node as ironic-api (on baremetal). default for uwsgi host ip is 0.0.0.0, but haproxy is already listening on thre same ports	17:03
jrosser	the patch that enabled bind-to-mgmt should have set openstack_service_bind_address to something like {{ mangement_address }} iirc	17:04
jamesdenton	i might not have that patch	17:04
noonedeadpunk	https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/all.yml#L38	17:07
noonedeadpunk	it's only in master iirc	17:07
jamesdenton	bueno. i overrode to ansible_host, but looks good	17:08
jamesdenton	running ussuri here	17:08
jamesdenton	thank you	17:08
noonedeadpunk	jamesdenton: and you're running U on metal?	17:08
noonedeadpunk	I guess you're if asking:)	17:09
noonedeadpunk	it kind of means that we've failed with our plan to do CI only thig jrosser	17:09
jamesdenton	kinda sorta. this is actually my home lab setup, which started as a rocky? environment a couple of years ago and has been upgraded. slowing moving lxc->baremetal	17:10
jamesdenton	we run Stein on baremetal in prod, but have dedicated haproxy nodes there	17:10
noonedeadpunk	jamesdenton: well, I think we will need some migration plan then....	17:10
noonedeadpunk	as what I did was https://review.opendev.org/#/c/758461/	17:11
noonedeadpunk	(not sure it's working at all at the moment)	17:11
noonedeadpunk	but not suitable for prod for sure	17:11
jamesdenton	or stein environments are greenfield, fwiw	17:11
jamesdenton	*our	17:12
jamesdenton	my migration plan has been to copy the env.d files to /etc/openstack_deploy/env.d per service, set baremetal=true, remove the existing lxc container from inventory, regenerate inventory, and redeploy the respective service. But these oddities show up, like haproxy. i'd have to be much more methodical about it	17:13
noonedeadpunk	Well, I was meaning bare metal deployment with external haproxy/tons of overrides on U to V bare metal with our bind-to-mgmt	17:15
noonedeadpunk	and bad thing about that is ppl might have tons of different ways to workaround....	17:16
jamesdenton	agreed	17:16
noonedeadpunk	offtop - dunno how to comment that (it's just beginning of the work) https://github.com/ansible-collections/cloud.roles	17:17
recyclehero	guys in the repo container /var/www/repo/os-releases/21.0.1 every service has 4 files but horizon is mission the constraint one?	17:18
recyclehero	.	17:18
jamesdenton	i'm curious as to which ton of overrides you're referring to for external haproxy	17:18
noonedeadpunk	I was reffering to overrides in case of internal haproxy)	17:19
jamesdenton	oh ok	17:19
recyclehero	what are these .txt files	17:19
recyclehero	?	17:19
jamesdenton	yeah, well, like you said, who knows how many people were/are actually doing that?	17:19
jamesdenton	i kinda feel like if you go off the reservation, you're on your own, to an extent. But if there's a "sanctioned" architecture and migration plan, that;s the one you test against	17:20
*** andrewbonney has quit IRC		17:20
noonedeadpunk	well yes, fair	17:20
recyclehero	openstack-ansible repo-install.yml -e "venv_rebuild=True"	17:29
recyclehero	would this help?	17:29
noonedeadpunk	recyclehero: help with what?	17:31
noonedeadpunk	you have error installing horizon?	17:31
recyclehero	yes	17:32
recyclehero	it complains about a file missing	17:32
recyclehero	horizon-21.0.1-constraints.txt	17:32
recyclehero	it should be in the repo server, and I checked its there for them all except horizon.	17:33
noonedeadpunk	openstack-ansible os-horizon-install.yml -e "venv_rebuild=True"	17:33
recyclehero	then continue with setup-openstack ?	17:34
noonedeadpunk	well either this or manually run rest of playbooks for services that you want to deploy	17:37
noonedeadpunk	https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-openstack.yml#L25-L50	17:38
noonedeadpunk	but for core I think horizon is one of the last ones	17:38
recyclehero	noonedeadpunk: btw is this in 21.1.0?	17:41
recyclehero	https://review.opendev.org/#/c/751724/	17:43
noonedeadpunk	um...... it's not but I was pretty sure that it is...	17:44
noonedeadpunk	ah wait	17:44
*** MickyMan77 has joined #openstack-ansible		17:48
noonedeadpunk	recyclehero: yep, it's included	17:49
noonedeadpunk	https://opendev.org/openstack/openstack-ansible/src/tag/21.1.0/ansible-role-requirements.yml#L44	17:49
recyclehero	great, how did u found out that this commit is included on that lxc_host version?	17:53
noonedeadpunk	well by the commit SHA	17:57
noonedeadpunk	you may see that for stable/ussuri that sha is exactly this commit https://opendev.org/openstack/openstack-ansible-lxc_hosts/commits/branch/stable/ussuri	17:58
recyclehero	got it thanks	17:59
MickyMan77	Hi, I run it a problem with Cloud-init and SSh keys. "no authorized SSH keys fingerprints found for user debian" --> http://paste.openstack.org/show/799088/	18:00
MickyMan77	it there any easy way to fix it ?	18:00
fridtjof[m]	Hey, I'm encountering a lot of instability in an environment. (especially when creating instances) nova-api-wsgi is losing connection to rabbitmq a lot - rabbit is closing the connections due to missing heartbeats, which would indicate that nova-api-wsgi is somehow failing to send those properly?	18:01
fridtjof[m]	any ideas?	18:01
fridtjof[m]	this env is on train 20.1.7	18:01
spatel	MickyMan77: did you try other distro?	18:01
MickyMan77	same issue with debian and centos 8	18:02
fridtjof[m]	whoa, i just got a huge exception in the log	18:03
spatel	MickyMan77: I think neutron metadata services provide those function so make sure its up and running - https://docs.openstack.org/nova/latest/user/metadata.html	18:03
fridtjof[m]	http://paste.openstack.org/show/799093/	18:04
recyclehero	MickyMan77: these comes to my mind 1- check metadata 2-maybe write some key directly to the volume using libguestfs tools and inspect more	18:06
jrosser	MickyMan77: you did create/upload an ssh keypair?	18:06
MickyMan77	i did upload the key.	18:07
spatel	fridtjof[m]: i had that kind of issue when i was using F5 load-balancer and found TCP timeout setting was different on F5 and it was closing connection	18:07
*** djhankb has quit IRC		18:07
fridtjof[m]	this is a plain haproxy setup unfortunately	18:08
fridtjof[m]	just one infra node	18:08
jrosser	MickyMan77: this is not good 87.370969] cloud-init[419]: 2020-10-15 17:45:00,488 - util.py[WARNING]: No active metadata service found	18:08
noonedeadpunk	fridtjof[m]: well I'm wondering if everything with network is ok, and considering it's ok, then I think it's worth checking if you have some rabbit queue overflowed with unread messages	18:08
MickyMan77	I will take a look at metadata service	18:09
spatel	yes, network issue, mtu (could be) or not health MQ.	18:09
noonedeadpunk	you can check that with `rabbitmqctl -p /nova list_queues \| egrep -v "0$"` (but vhosts are nova,cinder,glance,neutron,etc)	18:09
fridtjof[m]	network shouldn't be an issue, the containers are all on the same host in this case	18:09
noonedeadpunk	can you have OOM? :)	18:09
fridtjof[m]	sorry for the dumb question, but how do I check rabbitmq's health?	18:10
openstackgerrit	Merged openstack/openstack-ansible-os_octavia stable/rocky: Save iptables rules for all Debian derivative operating systems https://review.opendev.org/672556	18:10
fridtjof[m]	ah	18:10
noonedeadpunk	fridtjof[m]: well, there's dashboard and cli util and...	18:10
fridtjof[m]	128GB on the infra host, oom shouldn't be a problem	18:10
MickyMan77	is the website https://docs.openstack.org/ down ?	18:11
noonedeadpunk	well yeah	18:11
fridtjof[m]	hm, queues are empty	18:11
spatel	is this AIO deployment?	18:11
noonedeadpunk	MickyMan77: yep :(	18:11
fridtjof[m]	not quite, it's one infra + storage host, and two compute hosts	18:12
fridtjof[m]	what i find weird is that rabbitmq gives me this log output a lot:	18:12
fridtjof[m]	2020-10-15 18:09:11.722 [error] <0.17759.2> closing AMQP connection <0.17759.2> (10.1.70.227:50126 -> 10.1.70.29:5671 - uwsgi:9290:3e5c3b51-cd0f-4527-9035-cb29d21c23fd):	18:12
fridtjof[m]	missed heartbeats from client, timeout: 60s	18:12
fridtjof[m]	227 is the nova-api container	18:12
spatel	fridtjof[m]: its normal i believe, i am seeing this in my network very randomly (i believe its kind of bug)	18:13
spatel	run tcpdump on port 5671	18:13
fridtjof[m]	it correlates with nova-api constantly logging this: http://paste.openstack.org/show/799094/	18:14
noonedeadpunk	fridtjof[m]: oh, and you're running rabbit with ssl?	18:14
fridtjof[m]	yeah, i'm just wondering if it's a load + timeout issue for my problems	18:14
spatel	that is not good error message	18:14
fridtjof[m]	whatever the default in OSA is	18:15
noonedeadpunk	I think be default we don't use ssl for rabbit/mysql as for now	18:15
noonedeadpunk	but exception is thrown by ssl module at the end	18:16
noonedeadpunk	ah, we use ssl by default	18:17
fridtjof[m]	hm, i'll change nova-api to use 5672 without ssl then	18:19
noonedeadpunk	also, what I used to run to fix rabbit - openstack-ansible playbooks/rabbitmq-install.yml -e rabbitmq_upgrade=true	18:20
noonedeadpunk	I think you can just change this for nova...	18:20
noonedeadpunk	this re-creates queues so might fix things if rabbit starts acting wierdly	18:21
fridtjof[m]	i did that yesterday (as part of a minor upgrade), but it didn't really help	18:21
fridtjof[m]	oh, oops. I rebooted the entire host, not the container	18:22
fridtjof[m]	that was dumb	18:23
spatel	noonedeadpunk: why do we need to use -e rabbitmq_upgrade=true ?	18:33
fridtjof[m]	alright, seems the reboot kind of helped	18:35
fridtjof[m]	the rabbitmq related issues are gone (for now, at least)	18:35
fridtjof[m]	but my base problem is still there >_>	18:35
jrosser	jamesdenton: with the haproxy/metal/bind-to-mgmt theres kind of two things in play	18:36
jrosser	without the bind-to-mgmt patches all the services were bound to 0.0.0.0 so thats the first thing that needs cleaning up, and those changes were a precursor to landing the haproxy+metal patch	18:37
jrosser	however for the prod deploys you mention where haproxy was on seperate nodes then that wouldnt have been an issue, so i would think that the changes we have made in V might not be so impactful there	18:38
fridtjof[m]	alright, i can pin down at least this issue now - creating an instance on an external network times out because network binding fails	18:38
fridtjof[m]	and i can see a steady stream of exceptions on one compute http://paste.openstack.org/show/799095/	18:39
fridtjof[m]	"permission denied" sounds like the agent is misconfigured?	18:39
noonedeadpunk	this sounds as missing sudo	18:39
jrosser	but..... if there are deploys where great effort has been done to make haproxy co-exist with a metal deploy infra node, thats where the upgrade might be more tricky	18:40
noonedeadpunk	do you have sudo binary on compute?	18:40
fridtjof[m]	sudo is present	18:40
fridtjof[m]	yeah	18:40
*** djhankb has joined #openstack-ansible		18:40
noonedeadpunk	the probably some command/path is missing from /etc/neutron/rootwrap.d/	18:42
noonedeadpunk	but um, there ton's of stuff	18:42
noonedeadpunk	so if you don't have command it tries to execute before that stack trace probably worth enabling debug and restarting service to see on what exact command it fails and not being able to gain permissions	18:43
fridtjof[m]	trying that	18:45
fridtjof[m]	debug output doesn't give me the exact command line :/	18:50
fridtjof[m]	i'll resort to adding some log statements now lol	18:50
recyclehero	guys I want to restore mariad in a 2 node env. 1 controller so 1 node galere.	18:56
recyclehero	I am planning to	18:56
recyclehero	1- stop maria 2-restore from backup 3-etc/init.d/mysql start --wsrep-new-cluster	18:57
recyclehero	sounds okay?	18:58
recyclehero	https://docs.openstack.org/openstack-ansible/ussuri/admin/maintenance-tasks.html#galera-cluster-recovery	18:59
recyclehero	recovering primary component link is broken on this page	19:00
fridtjof[m]	huh, it's on both compute nodes though.	19:00
fridtjof[m]	they both get "permission denied"	19:01
*** cshen has quit IRC		19:17
*** cshen has joined #openstack-ansible		19:17
*** yann-kaelig has quit IRC		19:18
MickyMan77	jrosser: when I check the meta via haproxy I do get info.. --> http://paste.openstack.org/show/799098/	19:19
MickyMan77	jrosser: but when I tcpdump the br-vlan nic I can't see any autogoing traffic from the new created Instances to the meta service.	19:20
*** gregwork has quit IRC		19:26
*** cshen has quit IRC		19:26
fridtjof[m]	okay, it's trying to add a link local ipv6 address to a brq interface...? and that gets a "permission denied"	19:30
fridtjof[m]	looks like it's not a permission problem after all.	19:32
fridtjof[m]	root@compute2-CP6NY03:~# ip a add fe80::ac59:20ff:fe4c:8cff/64 dev brqf7424189-aa	19:32
fridtjof[m]	RTNETLINK answers: Permission denied	19:32
fridtjof[m]	i replicated what it's trying to do	19:32
fridtjof[m]	okay, seems like that address already exists on eth12	19:33
fridtjof[m]	I configured my environment according to https://docs.openstack.org/openstack-ansible/train/user/prod/example.html	19:35
*** MickyMan77 has quit IRC		19:48
fridtjof[m]	okay, found the cause for the exception, but I have no idea why that is the case	20:07
fridtjof[m]	when I launch an instance attached to flat provider network (wired up on compute hosts through the br-vlan-veth/eth12 pair), a bridge "brq<guid>" gets created, eth12 and the VM's tap adapter get added to it	20:08
fridtjof[m]	then neutron-linuxbridge-agent is trying to add a link local ipv6 to the bridge, but /proc/sys/net/ipv6/conf/brqf7424189-aa/disable_ipv6 is 1	20:09
fridtjof[m]	and that's where the "permission denied" is coming from	20:09
fridtjof[m]	now, the question remains: why is that set to 1, and for what is it adding a link local v6 address to that adapter anyway?	20:10
Adri2000	fridtjof[m]: hello, I just read through quickly, and that sounds like an issue I had very recently and spent a day debugging with the help of neutron developers... have a look at https://bugs.launchpad.net/neutron/+bug/1899141 and https://review.opendev.org/#/c/757107/	20:13
openstack	Launchpad bug 1899141 in neutron "Linuxbridge agent NetlinkError: (13, 'Permission denied') after Stein upgrade" [Medium,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)	20:13
fridtjof[m]	> Can you add debug logs in the "add_ip_address" method	20:15
fridtjof[m]	oh my god, that's exactly what i ended up doing	20:16
jamesdenton	jrosser the changes you made will be helpful in my scenario, with haproxy on the controller nodes. thank you	20:16
fridtjof[m]	i wish i would've known earlier, i spent half my evening on this ;_;	20:16
fridtjof[m]	it's the exact same issue, thank you Adri2000	20:16
Adri2000	yw :)	20:17
*** spatel has quit IRC		20:25
*** jeh has joined #openstack-ansible		20:35
*** nurdie has quit IRC		20:38
*** nurdie has joined #openstack-ansible		21:09
*** cshen has joined #openstack-ansible		21:12
*** cshen has quit IRC		21:17
*** jbadiapa has quit IRC		21:17
recyclehero	Create the neutron provider netowrk facts works when neutron_provider_networks is not defined	21:30
recyclehero	so it means I cant make a change to netowrk physical mappings?	21:30
recyclehero	really simple question, how to set host vars? I want to set neutron_provider_networks per host. I am looking for host_vars	21:39
*** MickyMan77 has joined #openstack-ansible		21:44
*** gshippey has quit IRC		21:48
*** rh-jelabarre has quit IRC		22:00
*** yann-kaelig has joined #openstack-ansible		22:46
*** macz_ has quit IRC		23:03
*** cshen has joined #openstack-ansible		23:13
*** cshen has quit IRC		23:17
*** tosky has quit IRC		23:23

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!