Thursday, 2021-07-29

*** Guest2352 is now known as prometheanfire		00:34
*** prometheanfire is now known as Guest2640		00:35
*** Guest2640 is now known as prometheanfire		00:36
dmsimard	> 08:14:33 <evrardjp> jrosser: yes I am not surprised about the "do not install from git sources" . But I am more puzzled nowadays on how we managed to make ansible more complex than what it should be ...	00:48
dmsimard	Do you happen to have a link for that ? There's a lot of users that install from git (instead of galaxy) and it's well documented here: https://docs.ansible.com/ansible/latest/user_guide/collections_using.html#install-multiple-collections-with-a-requirements-file	00:49
dmsimard	I tried to find scary warnings but I don't see them	00:50
opendevreview	Ian Wienand proposed openstack/openstack-ansible-tests stable/stein: Update Debian stable job https://review.opendev.org/c/openstack/openstack-ansible-tests/+/802816	00:51
dmsimard	If you have pain points/papercuts from stuff like that, feel free to reach out to me, happy to be a liaison via my role in the ansible community team	00:51
opendevreview	David Moreau Simard proposed openstack/openstack-ansible master: DNM: Test ara 1.5.7rc3 with --diff https://review.opendev.org/c/openstack/openstack-ansible/+/696634	01:57
dmsimard	hopefully rc3 is good enough now haha	02:01
dmsimard	the good news it that testing ara with OSA has helped uncover various bugs in rc1 and rc2	02:02
dmsimard	thanks <3	02:02
opendevreview	Satish Patel proposed openstack/openstack-ansible-os_neutron master: Adding https option for calico metadata service https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/802819	03:13
evrardjp	dmsimard: I have very little experience with collections. I am old school : ) To what I have seen from collections, their download and their install are, by default, built from archives of git repositories without the .git. i.e. not git repos.	07:09
*** rpittau\|afk is now known as rpittau		07:10
evrardjp	My comment about "how we managed to make ansible more complex than it should be", is a reference to OSA. OSA could be simpler. Few examples: we decided to have a dynamic inventory for one single reason: lxc containers needing a random mac. Other example: We have ansible-role-requirements and ansible-collections-requirements. Ansible evolved now, and we could use leverage latest ansible features to simplify things. But sadly,	07:13
evrardjp	ansible itself is becoming more complex too...	07:13
evrardjp	I am just looking a way to _simplify_ where I can, to make operations simpler. However, targetting OSA for changes might not be the best first target for making operations simpler :D	07:15
jrosser	you can't install roles and collections to paths-of-your-choice with a single requirements file	07:23
jrosser	dmsimard: in the last week or so "sivel> fwiw, installing a collection from git, basically is a shortcut for developers, in that ansible-galaxy clones, builds the artifact, and the installs the artifact, throwing away the git clone"	07:24
jrosser	and "sivel> installing a collection from git is not supposed to be used for production installs fwiw, iirc we document that it should only be used in development, and you should create actual artifacts instead"	07:24
jrosser	evrardjp: i think that the publishing of collections is much more like pip/pypi than cloning git, if you follow the official way to push things to galaxy	07:26
admin1	o/	09:09
depasquale	ciao everybody	10:53
depasquale	I need help with an issue with openstack-ansible galera-install playbook	10:53
depasquale	I have reported the following bug -> https://bugs.launchpad.net/openstack-ansible/+bug/1938327	10:53
depasquale	can someone help me about this topic?	10:54
jrosser	depasquale: you could paste the output when you run the galera playbook to paste.opendev.org and put the link here?	11:20
depasquale	jrosser I will re-run right now and give you the output	11:48
depasquale	<jrosser> can you please check the following https://paste.opendev.org/show/807787/	11:56
jrosser	depasquale: so this time it has run through and completed?	11:57
depasquale	yes	11:58
depasquale	it is stucked at the stage of creating users...	11:58
depasquale	it will remain in this status for hours... without any further error	11:58
jrosser	is it ubuntu focal?	11:59
depasquale	Ubuntu 20.04.1 lts	11:59
jrosser	feels like this https://jira.mariadb.org/browse/MDEV-24829	12:04
depasquale	uhm... with OSA 22.1.3 I was able to complete the setup-infrastructure playbook	12:11
depasquale	it is very strange	12:11
depasquale	the fact that the other containers in infra2 and infra3 have no Mariadb installed is foreseen in your feeling?	12:12
jrosser	MDEV-24829 is not deterministic	12:13
depasquale	ok. there is a chance to downgrade mariadb to a 10.3.x version in openstack-ansible?	12:14
depasquale	just for my understanding	12:15
jrosser	the galera hosts are installed sequentially, not in parallel https://github.com/openstack/openstack-ansible/blob/master/playbooks/galera-install.yml#L44	12:15
jrosser	no it's not possible to downgrade	12:15
depasquale	ok so do I have any workarounds?	12:18
jrosser	give me a moment :)	12:18
jrosser	can you check which version of maradb is installed?	12:19
depasquale	Great!! :)	12:19
depasquale	ok let me check	12:19
jrosser	i would expect 10.5.8	12:20
depasquale	https://paste.opendev.org/show/807788/	12:20
depasquale	the output of service mysqld status	12:20
jrosser	then can you take a look at the output of journalctl -u mariadb	12:22
jrosser	is the end of the log "normal" or filled with loads of errors about mutex?	12:23
depasquale	sorry it took some time to face with a proxy error of paste.opendev.org...	12:28
depasquale	I took just the last lines of my 3k line file	12:28
depasquale	https://paste.opendev.org/show/807790/	12:28
depasquale	it seems there are several errors on mutex as you anticipated	12:28
jrosser	right, so i think if you systemctl restart mariadb	12:29
jrosser	then re-try the playbook it is possible it will succeed	12:29
depasquale	ok let me try	12:29
* jrosser curses recent mariadb releases :(		12:29
depasquale	I will restart mariadb and re-execute galera install	12:29
depasquale	mariadb restart is stucked ahahahah :D	12:31
jrosser	sometimes it can take a while	12:31
depasquale	unbelievable... and depressive! :)	12:31
jrosser	yeah	12:31
jrosser	10.5.9 is broken in different ways unfortunatley	12:32
jrosser	this is been horrible to deal with for us	12:32
jrosser	oh yes and 10.5.10 doesnt work with cinder properly	12:34
depasquale	wow! it looks promising for my installation :D	12:35
jrosser	awesome	12:35
depasquale	still waiting for the stop	12:35
depasquale	....	12:35
jrosser	if you were using the stable/wallaby branch of OSA it would install mariadb 10.5.9, and we have a built in workaround in the playbooks for https://jira.mariadb.org/browse/MDEV-25030	12:36
jrosser	so that release is not going to suffer from sometimes mariadb deadlocking on startup, on focal	12:37
depasquale	ok jrosser	12:39
depasquale	so I will do the following: format everything on my servers and move to wallaby release	12:40
depasquale	my goal is to find a reasonable and stable release to adopt in the distribution of a new cloud region for production... I start fearing about everything now :D	12:41
jrosser	well i read your launchpad bugs	12:41
depasquale	I really thank you for the help jrosser	12:42
jrosser	also we've not made a point release of wallaby since 23.0.0 so i would recommend using stable/wallaby head of branch instead of 23.0.0	12:42
depasquale	ah ok	12:43
jrosser	there is a point release every ~two weeks	12:43
jrosser	that brings in all the upstream fixes to nova/cinder/..... and also any bugfixes on the stable branch in openstack ansible / ansible roles	12:43
jrosser	just so the release model is clear	12:43
depasquale	what if I go for a victoria release but not on ubuntu 20.04?	12:44
jrosser	you could install victoria on bionic, as thats a supported OS for V	12:44
spatel	I am running victoria with ubuntu 20.04 in production and its rock solid	12:44
jrosser	depasquale: ^ there you go :)	12:45
depasquale	:)	12:45
spatel	I have 200 compute nodes in that cloud and didn't see any issue related mysql / cinder or anything name it :)	12:45
jrosser	you're looking for a "reasonable and stable release", what we have is a "reasonable way to keep on a recent release"	12:46
jrosser	spatel: no i was just explaining all the difficulties with having to pick a specific version of mariadb	12:46
depasquale	spatel wich version of ansible did you use?	12:46
jrosser	the version of ansible is defined entirely by which version of OSA you use	12:46
depasquale	because with jrosser we were discussing about the latest documentation that is osa 22.1.4 and it is not working for me on a small setup	12:47
spatel	depasquale ansible 2.10.5	12:47
jrosser	depasquale: i think if you did several deployments it would work sometimes, not others, becasue it's a non deterministic bug in mariadb	12:47
depasquale	yes yes jrosser I was wrong :) my curiosity was OSA	12:47
depasquale	not ansible	12:47
jrosser	ah!	12:47
depasquale	;)	12:48
jrosser	anyway, to answer your launchpad question - there are lots of people using OSA to deploy production clouds	12:48
jrosser	i'm one, so is spatel	12:48
spatel	depasquale i am running mariadb 10.5.8	12:48
depasquale	jrosser you were very clear thanks	12:49
depasquale	spatel I envy you	12:49
depasquale	:D	12:49
spatel	I am running 4 large production cloud with OSA. last 4 years i had zero downtime and issue again its all matter how you running all the stuff.	12:50
spatel	I have total 1000 compute nodes and soon going to open new datacenter :)	12:50
jrosser	depasquale: OSA is made by deployers for deployers	12:50
spatel	I am running my cloud using OSA + SRIOV for high performance network throughput	12:50
jrosser	spatel started as a user and is now fixing stuff / writing new support which is awesome :)	12:50
jrosser	and also making really cool blog posts for us all to learn from	12:51
spatel	jrosser :) yes 4 year ago i was asking same question, like is this stable.. is this going to work.. ?	12:51
spatel	but now i am so happy and keep going with OSA	12:51
jrosser	i remember :) it is so nice to see you contributing now too +++1	12:51
depasquale	ok ok so you motivated me! I will do it! Let me format everything and start again from the beginning!	12:52
spatel	depasquale you can see lots of my OSA related stuff here - https://satishdotpatel.github.io/blog/	12:52
jrosser	^ don't be afraid to do that a few times	12:52
jrosser	often it is quicker to wipe / run again than try to fix a mess, particulary for lab setups	12:52
depasquale	thanks spatel you have another follower	12:52
depasquale	:)	12:52
spatel	u welcome.. don't worry. i was in same boat few years ago.. chasing people to get right answer	12:53
jrosser	also OSA is a toolkit, not a shrink-wrap installer, there is massive flexibility to do whatever you like	12:54
jrosser	but that does come at a price of having to dig in and understand the internals a bit	12:54
depasquale	ok I hope to become an active member in some way. my openstack-queen is still working nicely... but I would love an automatic tool like osa to involve also other colleagues in	12:54
depasquale	thanks guys	12:55
jrosser	no problem, theres usually someone around here EU timezone so just ask if you get stuck	12:55
depasquale	I will try and try again and let you know about the success or defeats I will face with	12:55
depasquale	ok thanks	12:55
spatel	+1 you need to understand underlying structure of OSA without that it will be little struggling. once you know how OSA pieces laidout then you will rule	12:57
spatel	jrosser kick off export SCENARIO='aio_metal_calico' build in my lab to see where its failing, i know its metadata but not sure how to tell it to use https protocol but lets see..	13:00
jrosser	i saw your patch	13:00
jrosser	looks like felix doesnt understand https://......	13:00
jrosser	so kind of two options	13:00
jrosser	drop the calico job	13:00
jrosser	or override the thing that sets internal endpoint to https, just for the calico job	13:01
spatel	hmm	13:03
spatel	let me finish my lab and see if i can find work around otherwise i will drop calico	13:03
spatel	when you say drop it means remove it or set to non-voting ?	13:05
jrosser	perhaps something to discuss at the weekly meeting next week would be if we keep the calico job or not	13:06
jrosser	but i think we can make it work by switching the internal endpoint back to http	13:06
jrosser	there are overrides here which are only used for the calico test jobs https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_calico.yml.j2	13:07
spatel	jrosser that is what i want to test in my lab to point to internal and see if tempest pass if not then we can just drop calico	13:09
spatel	I don't know how many people want to deploy openstack with calico ?	13:09
jrosser	internal is https in master though	13:09
spatel	Yes i think we moved everything to SSL vips recently	13:10
jrosser	so my suggestion for the calico job is to switch the internal VIP back to http	13:12
spatel	switch all internal vip back to http OR just nova-metadata vip?	13:14
jrosser	interesting question	13:16
jrosser	the easiest thing is to just switch them all back	13:17
spatel	agreed.. let me see what we can do otherwise set it to non-voting to unblock others	13:18
jrosser	it looks like the way to do it is to do it here https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L267-L269	13:18
spatel	yes, openstack_service_internaluri_proto: http	13:19
jrosser	i am now thinking you can't do that in the calico specific user_variables file	13:19
jrosser	becasue it's the same variable precedence as what comes from the user_variables.yml.j2 template	13:20
spatel	hmm	13:20
spatel	I am very curious why calico felix configuration doesn't support https protocol.. thinking to open bug for that	13:21
spatel	I have opened bug to networking-calico so lets see if someone answer or fix it	13:29
spatel	https://bugs.launchpad.net/networking-calico/+bug/1938447	13:42
spatel	jrosser look like someone know how to fix it :) https://bugs.launchpad.net/networking-calico/+bug/1938447	14:25
jrosser	spatel: maybe look at some of your non-calico stuff	15:48
spatel	jrosser i think felix not going to work with SSL	15:49
spatel	We have to change our haproxy endpoint to non-SSL	15:49
jrosser	nova-metadata service(http) <- OSA haproxy (https) <- neutron haproxy on network node(http?) <- instance asks for metadata	15:50
spatel	This is what calico felix doing, inserting iptables rules on compute node - -A cali-PREROUTING -d 169.254.169.254/32 -p tcp -m comment --comment "cali:J9-8BAIsw7Yc9tBK" -m multiport --dports 80 -j DNAT --to-destination 172.29.236.101:8775	15:50
jrosser	right, so from the VM perspective i think the metadata service is still expected to be http, even when the internal VIP is https?	15:50
spatel	if felix using iptables then we can't tell it to use SSL	15:50
spatel	check this thread - https://github.com/projectcalico/felix/issues/2933	15:51
jrosser	becasue normally there is haproxy on the network node, doing something more complex than just an iptables forward	15:51
jrosser	yes i'm reading it	15:51
spatel	:)	15:51
jrosser	so it's the case that we have an http -> https translation in the neutron haproxy right?	15:51
spatel	yes that should work	15:51
jrosser	i think thats what we have today in a normal deployment without calico	15:52
spatel	why don't we create one extra vip endpoint for nova-api with non-SSL	15:52
spatel	keep everything SSL and just nova-api-metadata with http and https both	15:52
jrosser	yeah, would have to look how to do that	15:53
spatel	curious why we decided to go all SSL ?	15:54
spatel	why don't we set it to non-voting and later when we have good solution we can remove non-voting	15:59
spatel	i don't know how many people deploying openstack with calico and they are very dependent on CI job	15:59
*** rpittau is now known as rpittau\|afk		16:03
jrosser	spatel: well, the rework of all the SSL stuff i did was primarily aimed at the public endpoint	16:07
jrosser	but noonedeadpunk did a load of followup work on that to also apply it to the internal endpoint	16:07
jrosser	i expect there are some good reasons they have at city network to want to do that, perhaps regulatory / compliance issued depending on who the customers are?	16:08
spatel	yes, public endpoint was already public earlier look like we just made it for all	16:08
jrosser	evrardjp: ^ do you have insight into this?	16:08
jrosser	well it was loads of extra work	16:08
jrosser	in a way, doing the internal endpoint was harder than the external	16:08
spatel	i am worried if i upgrade my openstack may this change break some stuff	16:08
jrosser	the upgrade jobs are passing :)	16:09
jrosser	but you should read/understand how the new PKI role is used	16:09
jrosser	particularly if you want your own, or a trusted certificate on the internal endpoint	16:09
jrosser	by default it will create a custom CA and certificates for internal	16:10
spatel	assuming we are using self-singed certificate righg?	16:10
jrosser	for internal or external :) ?	16:11
spatel	internal	16:11
jrosser	it's a bit more complicated now	16:12
spatel	we may need to think about renew them also at some point, I would prefer if we have nod to turn it on and off :)	16:12
spatel	SSL is always difficult + hard to troubleshoot, specially with tcpdump etc..	16:13
spatel	assuming haproxy_ssl_all_vips: false will turn SSL stuff off and make it deployment like previous right? but what will happened to external vips?	16:14
jrosser	i linked you three variables before	16:16
jrosser	in master theres also support for different certs on the internal and external endpoints	16:21
jrosser	i think this needs some new documentation writing	16:21
spatel	https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L267-L269	16:25
spatel	for experiment i did haproxy_ssl_all_vips: false and re-run haproxy playbook but nothing happened	16:25
jrosser	what about openstack_service_internaluri_proto ?	16:26
spatel	i didn't set that but let me set all 3 and re-run playbook	16:30
jrosser	you should see it make changes to the haproxy config and reload it	16:31
spatel	no luck, i did set this https://paste.opendev.org/show/807800/	16:31
jrosser	do you mean really "nothing happened" ?	16:32
spatel	re-run haproxy-server.yml and still nothing changed in haproxy.cfg	16:32
spatel	aio1 : ok=38 changed=0 unreachable=0 failed=0 skipped=30 rescued=0 ignored=0	16:32
jrosser	and you're sure that var isnt also set somewhere else in /etc/openstack_deploy ?	16:32
spatel	damn it you are right.. it was in same user_variables file but in different locations so i didn't scan all the lines.. look like it works	16:35
spatel	so that is what we need to turn it on and off	16:37
spatel	now all internal endpoints are non-SSL	16:37
spatel	why don't we educate end user to use these 3 nod to make your deployment super secure	16:38
spatel	we have two solution here to fix calico	16:43
spatel	1. add special stanza for nova_api_metadata to non-SSL	16:43
spatel	2. disable SSL for deployment and let user decide to enable or not (but it will still break calico) so not a good option	16:44
spatel	3. we can deploy small haproxy for calico on compute node to handle Metadata service, that is what neutron_ovn doing :)	16:45
spatel	@jrosser ^	16:45
*** sshnaidm is now known as sshnaidm\|afk		18:30
evrardjp	hey. I am not aware of this previous work. I am not surprised, however, with our compliance requirements.	21:19
evrardjp	(it was in reference to SSL everywhere)	21:20
opendevreview	David Moreau Simard proposed openstack/openstack-ansible master: DNM: Test ara 1.5.7rc4 with --diff https://review.opendev.org/c/openstack/openstack-ansible/+/696634	21:41
opendevreview	Ian Wienand proposed openstack/openstack-ansible-tests stable/stein: Update Debian stable job https://review.opendev.org/c/openstack/openstack-ansible-tests/+/802816	22:01

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!