*** Guest2352 is now known as prometheanfire | 00:34 | |
*** prometheanfire is now known as Guest2640 | 00:35 | |
*** Guest2640 is now known as prometheanfire | 00:36 | |
dmsimard | > 08:14:33 <evrardjp> jrosser: yes I am not surprised about the "do not install from git sources" . But I am more puzzled nowadays on how we managed to make ansible more complex than what it should be ... | 00:48 |
---|---|---|
dmsimard | Do you happen to have a link for that ? There's a lot of users that install from git (instead of galaxy) and it's well documented here: https://docs.ansible.com/ansible/latest/user_guide/collections_using.html#install-multiple-collections-with-a-requirements-file | 00:49 |
dmsimard | I tried to find scary warnings but I don't see them | 00:50 |
opendevreview | Ian Wienand proposed openstack/openstack-ansible-tests stable/stein: Update Debian stable job https://review.opendev.org/c/openstack/openstack-ansible-tests/+/802816 | 00:51 |
dmsimard | If you have pain points/papercuts from stuff like that, feel free to reach out to me, happy to be a liaison via my role in the ansible community team | 00:51 |
opendevreview | David Moreau Simard proposed openstack/openstack-ansible master: DNM: Test ara 1.5.7rc3 with --diff https://review.opendev.org/c/openstack/openstack-ansible/+/696634 | 01:57 |
dmsimard | hopefully rc3 is good enough now haha | 02:01 |
dmsimard | the good news it that testing ara with OSA has helped uncover various bugs in rc1 and rc2 | 02:02 |
dmsimard | thanks <3 | 02:02 |
opendevreview | Satish Patel proposed openstack/openstack-ansible-os_neutron master: Adding https option for calico metadata service https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/802819 | 03:13 |
evrardjp | dmsimard: I have very little experience with collections. I am old school : ) To what I have seen from collections, their download and their install are, by default, built from archives of git repositories without the .git. i.e. not git repos. | 07:09 |
*** rpittau|afk is now known as rpittau | 07:10 | |
evrardjp | My comment about "how we managed to make ansible more complex than it should be", is a reference to OSA. OSA could be simpler. Few examples: we decided to have a dynamic inventory for one single reason: lxc containers needing a random mac. Other example: We have ansible-role-requirements and ansible-collections-requirements. Ansible evolved now, and we could use leverage latest ansible features to simplify things. But sadly, | 07:13 |
evrardjp | ansible itself is becoming more complex too... | 07:13 |
evrardjp | I am just looking a way to _simplify_ where I can, to make operations simpler. However, targetting OSA for changes might not be the best first target for making operations simpler :D | 07:15 |
jrosser | you can't install roles and collections to paths-of-your-choice with a single requirements file | 07:23 |
jrosser | dmsimard: in the last week or so "sivel> fwiw, installing a collection from git, basically is a shortcut for developers, in that ansible-galaxy clones, builds the artifact, and the installs the artifact, throwing away the git clone" | 07:24 |
jrosser | and "sivel> installing a collection from git is not supposed to be used for production installs fwiw, iirc we document that it should only be used in development, and you should create actual artifacts instead" | 07:24 |
jrosser | evrardjp: i think that the publishing of collections is much more like pip/pypi than cloning git, if you follow the official way to push things to galaxy | 07:26 |
admin1 | o/ | 09:09 |
depasquale | ciao everybody | 10:53 |
depasquale | I need help with an issue with openstack-ansible galera-install playbook | 10:53 |
depasquale | I have reported the following bug -> https://bugs.launchpad.net/openstack-ansible/+bug/1938327 | 10:53 |
depasquale | can someone help me about this topic? | 10:54 |
jrosser | depasquale: you could paste the output when you run the galera playbook to paste.opendev.org and put the link here? | 11:20 |
depasquale | jrosser I will re-run right now and give you the output | 11:48 |
depasquale | <jrosser> can you please check the following https://paste.opendev.org/show/807787/ | 11:56 |
jrosser | depasquale: so this time it has run through and completed? | 11:57 |
depasquale | yes | 11:58 |
depasquale | it is stucked at the stage of creating users... | 11:58 |
depasquale | it will remain in this status for hours... without any further error | 11:58 |
jrosser | is it ubuntu focal? | 11:59 |
depasquale | Ubuntu 20.04.1 lts | 11:59 |
jrosser | feels like this https://jira.mariadb.org/browse/MDEV-24829 | 12:04 |
depasquale | uhm... with OSA 22.1.3 I was able to complete the setup-infrastructure playbook | 12:11 |
depasquale | it is very strange | 12:11 |
depasquale | the fact that the other containers in infra2 and infra3 have no Mariadb installed is foreseen in your feeling? | 12:12 |
jrosser | MDEV-24829 is not deterministic | 12:13 |
depasquale | ok. there is a chance to downgrade mariadb to a 10.3.x version in openstack-ansible? | 12:14 |
depasquale | just for my understanding | 12:15 |
jrosser | the galera hosts are installed sequentially, not in parallel https://github.com/openstack/openstack-ansible/blob/master/playbooks/galera-install.yml#L44 | 12:15 |
jrosser | no it's not possible to downgrade | 12:15 |
depasquale | ok so do I have any workarounds? | 12:18 |
jrosser | give me a moment :) | 12:18 |
jrosser | can you check which version of maradb is installed? | 12:19 |
depasquale | Great!! :) | 12:19 |
depasquale | ok let me check | 12:19 |
jrosser | i would expect 10.5.8 | 12:20 |
depasquale | https://paste.opendev.org/show/807788/ | 12:20 |
depasquale | the output of service mysqld status | 12:20 |
jrosser | then can you take a look at the output of journalctl -u mariadb | 12:22 |
jrosser | is the end of the log "normal" or filled with loads of errors about mutex? | 12:23 |
depasquale | sorry it took some time to face with a proxy error of paste.opendev.org... | 12:28 |
depasquale | I took just the last lines of my 3k line file | 12:28 |
depasquale | https://paste.opendev.org/show/807790/ | 12:28 |
depasquale | it seems there are several errors on mutex as you anticipated | 12:28 |
jrosser | right, so i think if you systemctl restart mariadb | 12:29 |
jrosser | then re-try the playbook it is possible it will succeed | 12:29 |
depasquale | ok let me try | 12:29 |
* jrosser curses recent mariadb releases :( | 12:29 | |
depasquale | I will restart mariadb and re-execute galera install | 12:29 |
depasquale | mariadb restart is stucked ahahahah :D | 12:31 |
jrosser | sometimes it can take a while | 12:31 |
depasquale | unbelievable... and depressive! :) | 12:31 |
jrosser | yeah | 12:31 |
jrosser | 10.5.9 is broken in different ways unfortunatley | 12:32 |
jrosser | this is been horrible to deal with for us | 12:32 |
jrosser | oh yes and 10.5.10 doesnt work with cinder properly | 12:34 |
depasquale | wow! it looks promising for my installation :D | 12:35 |
jrosser | awesome | 12:35 |
depasquale | still waiting for the stop | 12:35 |
depasquale | .... | 12:35 |
jrosser | if you were using the stable/wallaby branch of OSA it would install mariadb 10.5.9, and we have a built in workaround in the playbooks for https://jira.mariadb.org/browse/MDEV-25030 | 12:36 |
jrosser | so that release is not going to suffer from sometimes mariadb deadlocking on startup, on focal | 12:37 |
depasquale | ok jrosser | 12:39 |
depasquale | so I will do the following: format everything on my servers and move to wallaby release | 12:40 |
depasquale | my goal is to find a reasonable and stable release to adopt in the distribution of a new cloud region for production... I start fearing about everything now :D | 12:41 |
jrosser | well i read your launchpad bugs | 12:41 |
depasquale | I really thank you for the help jrosser | 12:42 |
jrosser | also we've not made a point release of wallaby since 23.0.0 so i would recommend using stable/wallaby head of branch instead of 23.0.0 | 12:42 |
depasquale | ah ok | 12:43 |
jrosser | there is a point release every ~two weeks | 12:43 |
jrosser | that brings in all the upstream fixes to nova/cinder/..... and also any bugfixes on the stable branch in openstack ansible / ansible roles | 12:43 |
jrosser | just so the release model is clear | 12:43 |
depasquale | what if I go for a victoria release but not on ubuntu 20.04? | 12:44 |
jrosser | you could install victoria on bionic, as thats a supported OS for V | 12:44 |
spatel | I am running victoria with ubuntu 20.04 in production and its rock solid | 12:44 |
jrosser | depasquale: ^ there you go :) | 12:45 |
depasquale | :) | 12:45 |
spatel | I have 200 compute nodes in that cloud and didn't see any issue related mysql / cinder or anything name it :) | 12:45 |
jrosser | you're looking for a "reasonable and stable release", what we have is a "reasonable way to keep on a recent release" | 12:46 |
jrosser | spatel: no i was just explaining all the difficulties with having to pick a specific version of mariadb | 12:46 |
depasquale | spatel wich version of ansible did you use? | 12:46 |
jrosser | the version of ansible is defined entirely by which version of OSA you use | 12:46 |
depasquale | because with jrosser we were discussing about the latest documentation that is osa 22.1.4 and it is not working for me on a small setup | 12:47 |
spatel | depasquale ansible 2.10.5 | 12:47 |
jrosser | depasquale: i think if you did several deployments it would work sometimes, not others, becasue it's a non deterministic bug in mariadb | 12:47 |
depasquale | yes yes jrosser I was wrong :) my curiosity was OSA | 12:47 |
depasquale | not ansible | 12:47 |
jrosser | ah! | 12:47 |
depasquale | ;) | 12:48 |
jrosser | anyway, to answer your launchpad question - there are lots of people using OSA to deploy production clouds | 12:48 |
jrosser | i'm one, so is spatel | 12:48 |
spatel | depasquale i am running mariadb 10.5.8 | 12:48 |
depasquale | jrosser you were very clear thanks | 12:49 |
depasquale | spatel I envy you | 12:49 |
depasquale | :D | 12:49 |
spatel | I am running 4 large production cloud with OSA. last 4 years i had zero downtime and issue again its all matter how you running all the stuff. | 12:50 |
spatel | I have total 1000 compute nodes and soon going to open new datacenter :) | 12:50 |
jrosser | depasquale: OSA is made by deployers for deployers | 12:50 |
spatel | I am running my cloud using OSA + SRIOV for high performance network throughput | 12:50 |
jrosser | spatel started as a user and is now fixing stuff / writing new support which is awesome :) | 12:50 |
jrosser | and also making really cool blog posts for us all to learn from | 12:51 |
spatel | jrosser :) yes 4 year ago i was asking same question, like is this stable.. is this going to work.. ? | 12:51 |
spatel | but now i am so happy and keep going with OSA | 12:51 |
jrosser | i remember :) it is so nice to see you contributing now too +++1 | 12:51 |
depasquale | ok ok so you motivated me! I will do it! Let me format everything and start again from the beginning! | 12:52 |
spatel | depasquale you can see lots of my OSA related stuff here - https://satishdotpatel.github.io/blog/ | 12:52 |
jrosser | ^ don't be afraid to do that a few times | 12:52 |
jrosser | often it is quicker to wipe / run again than try to fix a mess, particulary for lab setups | 12:52 |
depasquale | thanks spatel you have another follower | 12:52 |
depasquale | :) | 12:52 |
spatel | u welcome.. don't worry. i was in same boat few years ago.. chasing people to get right answer | 12:53 |
jrosser | also OSA is a toolkit, not a shrink-wrap installer, there is massive flexibility to do whatever you like | 12:54 |
jrosser | but that does come at a price of having to dig in and understand the internals a bit | 12:54 |
depasquale | ok I hope to become an active member in some way. my openstack-queen is still working nicely... but I would love an automatic tool like osa to involve also other colleagues in | 12:54 |
depasquale | thanks guys | 12:55 |
jrosser | no problem, theres usually someone around here EU timezone so just ask if you get stuck | 12:55 |
depasquale | I will try and try again and let you know about the success or defeats I will face with | 12:55 |
depasquale | ok thanks | 12:55 |
spatel | +1 you need to understand underlying structure of OSA without that it will be little struggling. once you know how OSA pieces laidout then you will rule | 12:57 |
spatel | jrosser kick off export SCENARIO='aio_metal_calico' build in my lab to see where its failing, i know its metadata but not sure how to tell it to use https protocol but lets see.. | 13:00 |
jrosser | i saw your patch | 13:00 |
jrosser | looks like felix doesnt understand https://...... | 13:00 |
jrosser | so kind of two options | 13:00 |
jrosser | drop the calico job | 13:00 |
jrosser | or override the thing that sets internal endpoint to https, just for the calico job | 13:01 |
spatel | hmm | 13:03 |
spatel | let me finish my lab and see if i can find work around otherwise i will drop calico | 13:03 |
spatel | when you say drop it means remove it or set to non-voting ? | 13:05 |
jrosser | perhaps something to discuss at the weekly meeting next week would be if we keep the calico job or not | 13:06 |
jrosser | but i think we can make it work by switching the internal endpoint back to http | 13:06 |
jrosser | there are overrides here which are only used for the calico test jobs https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_calico.yml.j2 | 13:07 |
spatel | jrosser that is what i want to test in my lab to point to internal and see if tempest pass if not then we can just drop calico | 13:09 |
spatel | I don't know how many people want to deploy openstack with calico ? | 13:09 |
jrosser | internal is https in master though | 13:09 |
spatel | Yes i think we moved everything to SSL vips recently | 13:10 |
jrosser | so my suggestion for the calico job is to switch the internal VIP back to http | 13:12 |
spatel | switch all internal vip back to http OR just nova-metadata vip? | 13:14 |
jrosser | interesting question | 13:16 |
jrosser | the easiest thing is to just switch them all back | 13:17 |
spatel | agreed.. let me see what we can do otherwise set it to non-voting to unblock others | 13:18 |
jrosser | it looks like the way to do it is to do it here https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L267-L269 | 13:18 |
spatel | yes, openstack_service_internaluri_proto: http | 13:19 |
jrosser | i am now thinking you can't do that in the calico specific user_variables file | 13:19 |
jrosser | becasue it's the same variable precedence as what comes from the user_variables.yml.j2 template | 13:20 |
spatel | hmm | 13:20 |
spatel | I am very curious why calico felix configuration doesn't support https protocol.. thinking to open bug for that | 13:21 |
spatel | I have opened bug to networking-calico so lets see if someone answer or fix it | 13:29 |
spatel | https://bugs.launchpad.net/networking-calico/+bug/1938447 | 13:42 |
spatel | jrosser look like someone know how to fix it :) https://bugs.launchpad.net/networking-calico/+bug/1938447 | 14:25 |
jrosser | spatel: maybe look at some of your non-calico stuff | 15:48 |
spatel | jrosser i think felix not going to work with SSL | 15:49 |
spatel | We have to change our haproxy endpoint to non-SSL | 15:49 |
jrosser | nova-metadata service(http) <- OSA haproxy (https) <- neutron haproxy on network node(http?) <- instance asks for metadata | 15:50 |
spatel | This is what calico felix doing, inserting iptables rules on compute node - -A cali-PREROUTING -d 169.254.169.254/32 -p tcp -m comment --comment "cali:J9-8BAIsw7Yc9tBK" -m multiport --dports 80 -j DNAT --to-destination 172.29.236.101:8775 | 15:50 |
jrosser | right, so from the VM perspective i think the metadata service is still expected to be http, even when the internal VIP is https? | 15:50 |
spatel | if felix using iptables then we can't tell it to use SSL | 15:50 |
spatel | check this thread - https://github.com/projectcalico/felix/issues/2933 | 15:51 |
jrosser | becasue normally there is haproxy on the network node, doing something more complex than just an iptables forward | 15:51 |
jrosser | yes i'm reading it | 15:51 |
spatel | :) | 15:51 |
jrosser | so it's the case that we have an http -> https translation in the neutron haproxy right? | 15:51 |
spatel | yes that should work | 15:51 |
jrosser | i think thats what we have today in a normal deployment without calico | 15:52 |
spatel | why don't we create one extra vip endpoint for nova-api with non-SSL | 15:52 |
spatel | keep everything SSL and just nova-api-metadata with http and https both | 15:52 |
jrosser | yeah, would have to look how to do that | 15:53 |
spatel | curious why we decided to go all SSL ? | 15:54 |
spatel | why don't we set it to non-voting and later when we have good solution we can remove non-voting | 15:59 |
spatel | i don't know how many people deploying openstack with calico and they are very dependent on CI job | 15:59 |
*** rpittau is now known as rpittau|afk | 16:03 | |
jrosser | spatel: well, the rework of all the SSL stuff i did was primarily aimed at the public endpoint | 16:07 |
jrosser | but noonedeadpunk did a load of followup work on that to also apply it to the internal endpoint | 16:07 |
jrosser | i expect there are some good reasons they have at city network to want to do that, perhaps regulatory / compliance issued depending on who the customers are? | 16:08 |
spatel | yes, public endpoint was already public earlier look like we just made it for all | 16:08 |
jrosser | evrardjp: ^ do you have insight into this? | 16:08 |
jrosser | well it was loads of extra work | 16:08 |
jrosser | in a way, doing the internal endpoint was harder than the external | 16:08 |
spatel | i am worried if i upgrade my openstack may this change break some stuff | 16:08 |
jrosser | the upgrade jobs are passing :) | 16:09 |
jrosser | but you should read/understand how the new PKI role is used | 16:09 |
jrosser | particularly if you want your own, or a trusted certificate on the internal endpoint | 16:09 |
jrosser | by default it will create a custom CA and certificates for internal | 16:10 |
spatel | assuming we are using self-singed certificate righg? | 16:10 |
jrosser | for internal or external :) ? | 16:11 |
spatel | internal | 16:11 |
jrosser | it's a bit more complicated now | 16:12 |
spatel | we may need to think about renew them also at some point, I would prefer if we have nod to turn it on and off :) | 16:12 |
spatel | SSL is always difficult + hard to troubleshoot, specially with tcpdump etc.. | 16:13 |
spatel | assuming haproxy_ssl_all_vips: false will turn SSL stuff off and make it deployment like previous right? but what will happened to external vips? | 16:14 |
jrosser | i linked you three variables before | 16:16 |
jrosser | in master theres also support for different certs on the internal and external endpoints | 16:21 |
jrosser | i think this needs some new documentation writing | 16:21 |
spatel | https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L267-L269 | 16:25 |
spatel | for experiment i did haproxy_ssl_all_vips: false and re-run haproxy playbook but nothing happened | 16:25 |
jrosser | what about openstack_service_internaluri_proto ? | 16:26 |
spatel | i didn't set that but let me set all 3 and re-run playbook | 16:30 |
jrosser | you should see it make changes to the haproxy config and reload it | 16:31 |
spatel | no luck, i did set this https://paste.opendev.org/show/807800/ | 16:31 |
jrosser | do you mean really "nothing happened" ? | 16:32 |
spatel | re-run haproxy-server.yml and still nothing changed in haproxy.cfg | 16:32 |
spatel | aio1 : ok=38 changed=0 unreachable=0 failed=0 skipped=30 rescued=0 ignored=0 | 16:32 |
jrosser | and you're sure that var isnt also set somewhere else in /etc/openstack_deploy ? | 16:32 |
spatel | damn it you are right.. it was in same user_variables file but in different locations so i didn't scan all the lines.. look like it works | 16:35 |
spatel | so that is what we need to turn it on and off | 16:37 |
spatel | now all internal endpoints are non-SSL | 16:37 |
spatel | why don't we educate end user to use these 3 nod to make your deployment super secure | 16:38 |
spatel | we have two solution here to fix calico | 16:43 |
spatel | 1. add special stanza for nova_api_metadata to non-SSL | 16:43 |
spatel | 2. disable SSL for deployment and let user decide to enable or not (but it will still break calico) so not a good option | 16:44 |
spatel | 3. we can deploy small haproxy for calico on compute node to handle Metadata service, that is what neutron_ovn doing :) | 16:45 |
spatel | @jrosser ^ | 16:45 |
*** sshnaidm is now known as sshnaidm|afk | 18:30 | |
evrardjp | hey. I am not aware of this previous work. I am not surprised, however, with our compliance requirements. | 21:19 |
evrardjp | (it was in reference to SSL everywhere) | 21:20 |
opendevreview | David Moreau Simard proposed openstack/openstack-ansible master: DNM: Test ara 1.5.7rc4 with --diff https://review.opendev.org/c/openstack/openstack-ansible/+/696634 | 21:41 |
opendevreview | Ian Wienand proposed openstack/openstack-ansible-tests stable/stein: Update Debian stable job https://review.opendev.org/c/openstack/openstack-ansible-tests/+/802816 | 22:01 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!