*** tosky has quit IRC | 00:02 | |
*** maharg101 has joined #openstack-ansible | 00:20 | |
*** macz_ has quit IRC | 00:25 | |
*** maharg101 has quit IRC | 00:26 | |
*** rfolco has joined #openstack-ansible | 00:26 | |
*** cshen has joined #openstack-ansible | 00:30 | |
*** cshen has quit IRC | 00:35 | |
*** cshen has joined #openstack-ansible | 01:15 | |
*** rfolco has quit IRC | 01:17 | |
*** cshen has quit IRC | 01:19 | |
*** ierdem has quit IRC | 01:31 | |
*** maharg101 has joined #openstack-ansible | 02:21 | |
*** maharg101 has quit IRC | 02:26 | |
*** priteau has quit IRC | 03:03 | |
*** cshen has joined #openstack-ansible | 03:15 | |
*** cshen has quit IRC | 03:19 | |
*** openstackgerrit has quit IRC | 03:22 | |
*** gyee has quit IRC | 04:00 | |
*** maharg101 has joined #openstack-ansible | 04:22 | |
*** maharg101 has quit IRC | 04:27 | |
*** spatel has joined #openstack-ansible | 04:36 | |
*** cshen has joined #openstack-ansible | 05:15 | |
*** dasp has quit IRC | 05:18 | |
*** cshen has quit IRC | 05:20 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-ansible | 05:33 | |
*** dasp has joined #openstack-ansible | 05:34 | |
*** spatel has quit IRC | 05:51 | |
*** cloudnull has quit IRC | 05:56 | |
*** cloudnull has joined #openstack-ansible | 05:57 | |
*** rpittau|afk has quit IRC | 06:11 | |
*** mnaser has quit IRC | 06:11 | |
*** mnaser has joined #openstack-ansible | 06:12 | |
*** rpittau|afk has joined #openstack-ansible | 06:12 | |
*** pcaruana has joined #openstack-ansible | 06:19 | |
*** evrardjp has quit IRC | 06:49 | |
*** evrardjp_ has joined #openstack-ansible | 06:49 | |
*** pto has quit IRC | 06:56 | |
*** pto_ has joined #openstack-ansible | 06:56 | |
*** cshen has joined #openstack-ansible | 07:07 | |
*** cshen has quit IRC | 07:12 | |
*** cshen has joined #openstack-ansible | 07:14 | |
*** openstackgerrit has joined #openstack-ansible | 07:15 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Fix libsystemd version for Centos https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766030 | 07:15 |
---|---|---|
*** cshen has quit IRC | 07:18 | |
*** sep has joined #openstack-ansible | 07:20 | |
*** cshen has joined #openstack-ansible | 07:22 | |
*** cshen has quit IRC | 07:26 | |
*** maharg101 has joined #openstack-ansible | 07:44 | |
*** jbadiapa has joined #openstack-ansible | 07:48 | |
*** miloa has joined #openstack-ansible | 07:50 | |
*** macz_ has joined #openstack-ansible | 07:51 | |
*** macz_ has quit IRC | 07:55 | |
*** rgogunskiy has joined #openstack-ansible | 07:58 | |
*** andrewbonney has joined #openstack-ansible | 08:11 | |
*** miloa has quit IRC | 08:11 | |
*** miloa has joined #openstack-ansible | 08:12 | |
*** miloa has quit IRC | 08:13 | |
*** miloa has joined #openstack-ansible | 08:14 | |
*** rpittau|afk is now known as rpittau | 08:17 | |
*** cshen has joined #openstack-ansible | 08:17 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Fix libsystemd version for Centos https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766030 | 08:21 |
*** tosky has joined #openstack-ansible | 08:47 | |
*** spatel has joined #openstack-ansible | 08:52 | |
*** spatel has quit IRC | 08:56 | |
*** pto has joined #openstack-ansible | 09:09 | |
*** pto_ has quit IRC | 09:13 | |
openstackgerrit | James Gibson proposed openstack/openstack-ansible-os_keystone master: Add security.txt file hosting to keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/766437 | 09:27 |
*** macz_ has joined #openstack-ansible | 09:35 | |
*** macz_ has quit IRC | 09:40 | |
*** macz_ has joined #openstack-ansible | 10:33 | |
openstackgerrit | James Gibson proposed openstack/openstack-ansible-os_keystone master: Add security.txt file hosting to keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/766437 | 10:34 |
*** macz_ has quit IRC | 10:38 | |
*** rgogunskiy has quit IRC | 10:45 | |
*** rgogunskiy has joined #openstack-ansible | 10:47 | |
*** ierdem has joined #openstack-ansible | 11:16 | |
*** avagi has joined #openstack-ansible | 11:20 | |
*** avagi has quit IRC | 11:21 | |
*** avagi has joined #openstack-ansible | 11:22 | |
*** rfolco has joined #openstack-ansible | 11:54 | |
*** mike44333 has quit IRC | 12:17 | |
openstackgerrit | James Gibson proposed openstack/openstack-ansible master: Add security.txt to haproxy frontend https://review.opendev.org/c/openstack/openstack-ansible/+/766457 | 12:24 |
*** rfolco has quit IRC | 12:38 | |
*** rfolco has joined #openstack-ansible | 12:42 | |
*** rgogunskiy has quit IRC | 12:48 | |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-repo_server master: Fix order for removing nginx file. https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/766257 | 12:50 |
mgariepy | good mornign | 12:51 |
noonedeadpunk | o/ | 12:51 |
admin0 | \o | 12:52 |
mgariepy | anything interesting this morning ? | 12:53 |
avagi | hi, | 12:53 |
avagi | may I ask for help concerning keystone installation? | 12:53 |
avagi | Starting from yesterday I cannot install keystone, because the keystone-21.2.0-constraints.txt file is not existing in the repo container. | 12:53 |
noonedeadpunk | is it centos?:) | 12:53 |
avagi | yes | 12:53 |
noonedeadpunk | well.... with 8.3 things got broken. It was great rhel demo of how centos can be considered as "stable" now | 12:54 |
noonedeadpunk | https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766030 should work | 12:54 |
noonedeadpunk | oh, btw, jrosser - it just passed (with disabled centos metal test) | 12:55 |
avagi | thanks, I am going to check .... | 12:55 |
noonedeadpunk | and centos metal passing here https://zuul.opendev.org/t/openstack/build/6ddee2acf46e49e9b50dff6db5e6d63e | 12:56 |
mgariepy | https://centos.rip/ | 12:56 |
noonedeadpunk | haha | 12:56 |
mgariepy | lol | 12:57 |
mgariepy | not funny haha | 12:57 |
noonedeadpunk | well, funny until you use centos in prod | 12:58 |
mgariepy | yep. | 12:58 |
mgariepy | i'm very glad i'm not using centos. | 12:58 |
admin0 | +1 | 12:59 |
*** sshnaidm has quit IRC | 13:09 | |
*** sshnaidm has joined #openstack-ansible | 13:09 | |
*** priteau has joined #openstack-ansible | 13:10 | |
openstackgerrit | Andrew Bonney proposed openstack/openstack-ansible master: Ensure kuryr repo is available within CI images https://review.opendev.org/c/openstack/openstack-ansible/+/765765 | 13:14 |
*** zigo has joined #openstack-ansible | 13:51 | |
*** spatel has joined #openstack-ansible | 13:55 | |
spatel | noonedeadpunk: https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/765906 you already have this fix in your patch so we don't need this one right? | 13:59 |
noonedeadpunk | spatel: yeah, sorry, we had to squash commits somehow to fix everything in one patch | 14:01 |
spatel | no worry, i will abandon it to clean up | 14:02 |
noonedeadpunk | yeah, thanks, it really helped a lot, since we just took your code | 14:02 |
spatel | +1 | 14:03 |
spatel | what is the status of 8.3 and victoria at present ? | 14:03 |
spatel | are we going to merge 8.3 with victoria release? | 14:03 |
noonedeadpunk | I was about to make branching when 8.3 released | 14:03 |
noonedeadpunk | yes, totally | 14:03 |
noonedeadpunk | otherwise all stuf just stuck | 14:04 |
noonedeadpunk | as CI is broken | 14:04 |
noonedeadpunk | moreover, we need to backport fix to U | 14:04 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Ensure kuryr repo is available within CI images https://review.opendev.org/c/openstack/openstack-ansible/+/765765 | 14:07 |
*** newtim has joined #openstack-ansible | 14:14 | |
jrosser | it looks like these patches stand a chance to merge | 14:15 |
jrosser | problems with CI nodes being 'error' though | 14:15 |
noonedeadpunk | yeah. so annoying.... | 14:16 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Make CentOS 8 metal voting again https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766425 | 14:17 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Make CentOS 8 metal voting again https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766425 | 14:17 |
*** SecOpsNinja has joined #openstack-ansible | 14:21 | |
SecOpsNinja | hi to all. is there a way to force the haproxy lets encrypt renew process? from what i was search is not using cron jobs | 14:25 |
SecOpsNinja | i used for the first one https://docs.openstack.org/openstack-ansible/latest/user/security/ssl-certificates.html#letsencrypt-certificates | 14:26 |
jrosser | SecOpsNinja: can you explain "the first one"? | 14:29 |
jrosser | you mean the first set of variables given there, as you already have horizon deployed? | 14:31 |
SecOpsNinja | jroll, the first was created corretly using that info, but because i have multiple haproxy with non distributed storage the renew process didnt happen in the correct one and now that i have removed all the other haproxy i lost the renew certificate. I try to make "certbot renew " in the unique haproxy node and seeing whty Hook command "/etc/letsencrypt/renewal-hooks/pre/haproxy-pre" return | 14:32 |
SecOpsNinja | ed error code 124 | 14:32 |
SecOpsNinja | from what i was seging in haproxy ansible role we aren't using cronjobs and using the hooks of the certbot itself for this. trying to use why this doesnt renew | 14:33 |
* jroll does the periodic mis-ping wave to jrosser :P | 14:33 | |
SecOpsNinja | this because im getting this "Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1056)')" | 14:33 |
jrosser | jroll: o/ | 14:33 |
jroll | \o | 14:33 |
jrosser | SecOpsNinja: ok, so shared storage is not required for letsencrypt | 14:34 |
SecOpsNinja | it is if you using multipli haproxys in infra host like i was using | 14:34 |
jrosser | each haproxy node is responsible for renewing it's own certificate, certbot runs independantly on each one | 14:34 |
*** pto has quit IRC | 14:35 | |
SecOpsNinja | yep the service could work in any node so the certificate would be on only the one that renew and not in the others | 14:35 |
jrosser | the nodes are responsible for their own certificates | 14:35 |
jrosser | there is nothing shared, ever | 14:35 |
SecOpsNinja | because there isnt any way to sincronized all haproxys with the corect certificate unless is copied to all or using a distributed filesystem like nfs, ceph, flusterfs,... | 14:36 |
SecOpsNinja | but the public endpoint is using this lets encrypt cert | 14:36 |
SecOpsNinja | and what happend is that it renewed in a difierente haproxy that wasn't master and the haproxy only had the expired one for the public endpoint | 14:37 |
SecOpsNinja | now i only have 1 haproxy and glance until i resolve the problem of distributed storage like ceph | 14:37 |
jrosser | there is no synchronisation of the certificates | 14:37 |
jrosser | by deliberate design each haproxy (one or more) renews its own certitificate independantly | 14:38 |
*** tobberydberg has quit IRC | 14:38 | |
jrosser | keepalived is pointing the external VIP only ever to one of those | 14:38 |
SecOpsNinja | why deliberate design? | 14:38 |
jrosser | because synchronising the certs is difficult | 14:39 |
*** cshen has quit IRC | 14:39 | |
jrosser | it is easier to make it like this | 14:39 |
*** tobberydberg has joined #openstack-ansible | 14:39 | |
jrosser | there is an extra problem that you never know which haproxy has the external VIP | 14:40 |
jrosser | so you would have huge complexity making that one do the renewal and then distribute to the others | 14:40 |
jrosser | it is not known after deployment which is the active haproxy | 14:40 |
SecOpsNinja | yep the only solution would be a shared storage for this certificates if only the master one renews | 14:40 |
SecOpsNinja | supossly you dont difine one as master and the other as slaves? | 14:41 |
jrosser | keepalived / vrrp makes that choice | 14:41 |
SecOpsNinja | yep but you can define the one with higher prioruty... i think | 14:41 |
jrosser | absolutely, that can be done | 14:42 |
SecOpsNinja | but yeh i will take in mind next time i try to reactive multiple haproxys | 14:42 |
jrosser | the http-01 challenge is difficult | 14:42 |
jrosser | becasue it needs to hit the challenge URL at the VIP | 14:43 |
SecOpsNinja | but i think the solutuion would be the configuration of a shared drive (for the storage of the /etc/letsencrupt) if multiple haproxys are used | 14:43 |
jrosser | i'm not sure tbh, becasue that needs a filesystem style storage which otherwise does not exist in the deployment, and would become a SPOF | 14:44 |
jrosser | i am running the OSA haproxy role with HA letsencrypt in two production environments and also re-using it outside of OSA | 14:44 |
jrosser | it's working really solidly there | 14:45 |
jrosser | SecOpsNinja: there is a special backend in haproxy config which routes the http-01 challenge from the VIP to the particular haproxy that is renewing its cert | 14:47 |
jrosser | that allows all of them to renew even though the VIP is at a specific haproxy | 14:48 |
SecOpsNinja | SPOF? | 14:48 |
jrosser | single-point-of-failure | 14:48 |
jrosser | so we use haproxy itself to route the renewal challenge to the right certbot | 14:48 |
jrosser | the pre-hook is needed to stand up a temporary http server on the right port to swing the haproxy backend over to the node that is renewing | 14:49 |
jrosser | otherwise there is a race condition | 14:49 |
SecOpsNinja | but that you need to define a specific haproxy node to to only the renew or its something outside of haproxy? | 14:49 |
jrosser | i'm not really following | 14:50 |
jrosser | each haproxy is using cron to run certbot renew | 14:50 |
SecOpsNinja | sorry, regarding the specifial backend | 14:50 |
jrosser | certbot cron runs on one of the haproxy nodes | 14:50 |
SecOpsNinja | i didn't finf that crontab jjob | 14:51 |
jrosser | that starts a temporary server on the backend renewal port using python, in the pre-hook | 14:51 |
SecOpsNinja | but i would thy to find it | 14:51 |
SecOpsNinja | yep that i saw of the prehooks to create the http server | 14:51 |
jrosser | all the other haproxy nodes notice the backend port is up and direct http-01 challenges to that port | 14:51 |
jrosser | the one with the VIP receives the challenge and so the challenge goes to whichever haproxy node is renewing | 14:52 |
jrosser | the cron job is actually a systemd timer i think which comes with the ubuntu certbot package | 14:52 |
SecOpsNinja | yep but if the master changes the problem hapends because the renew certificate its in another node | 14:52 |
SecOpsNinja | ok didnt check the system cronjob :D | 14:53 |
jrosser | it does not matter which the master is | 14:53 |
jrosser | because all of the haproxy renew their certificates all the time | 14:53 |
SecOpsNinja | if you have 5 haproxy all of them are goung to request 5 renew of the same certificste? | 14:53 |
SecOpsNinja | ok that is a way to do it if you dodnt hit the lest encrypt api limits of request and renews | 14:54 |
SecOpsNinja | but from what i found yesterday is that only one of the haproxys renew the certificate and the other where still using the expired one in /etc/letsencrypt | 14:54 |
SecOpsNinja | that is why i asked about the workflow | 14:55 |
jrosser | renewals do not count against the rate limit but are subject to 5 a week for duplicates | 14:57 |
jrosser | if you were to add the unique fqdn of each haproxy instance as an extra --domain then they would not be duplicate and not subject to that limit | 14:57 |
SecOpsNinja | ah ok nice didnt «kno that of renewals | 15:00 |
SecOpsNinja | in meantime i was able to find the problem, it was another parent haproxy that is not pointing to the correct ports :( | 15:01 |
jrosser | the design used is this one https://serversforhackers.com/c/letsencrypt-with-haproxy | 15:01 |
jrosser | but extended to multiple haproxy | 15:01 |
SecOpsNinja | jrosser, tahnks for all the info :D | 15:01 |
jrosser | no problem :: anytime | 15:01 |
jrosser | :) even | 15:02 |
*** ericzolf has joined #openstack-ansible | 15:02 | |
jrosser | the whole thing is a balance really - renewal in one place only comes with some different complexities and how to make resilient | 15:02 |
jrosser | this is just a different more distributed approach | 15:03 |
SecOpsNinja | after resolving the problem of creating the vms because of duplicated galnce with not sincronezxed storage i wthink the next stepp would be to install ceph or glsuterfs as a bakend for all the services so i have less spof | 15:03 |
SecOpsNinja | but that will probably be a new year resolution :) | 15:04 |
jrosser | also if you need to work a lot on haproxy/LE add --staging to this https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/defaults/main.yml#L95 | 15:05 |
jrosser | then the rate limit is basically removed | 15:05 |
jrosser | but the certs are not valid, when you are happy, remove the flag and re-issue from the production LE endpoint | 15:06 |
SecOpsNinja | yep i sued the staging some times so dont mess with the prod api limits | 15:06 |
SecOpsNinja | but thanks anyway for tip | 15:06 |
SecOpsNinja | still learning a lot about the structure of all the roles in openstack.-ansible and there are a lot of them :D | 15:07 |
*** ierdem has quit IRC | 15:27 | |
mnaser | noonedeadpunk, jrosser you might be interested by currenet topic on #openstack-tc | 15:36 |
*** macz_ has joined #openstack-ansible | 16:01 | |
*** pcaruana has quit IRC | 16:10 | |
noonedeadpunk | folks, can we vote please for https://review.opendev.org/c/openstack/openstack-ansible/+/766244 ? | 16:19 |
mgariepy | noonedeadpunk, done. | 16:21 |
noonedeadpunk | jrosser: ?:) | 16:21 |
noonedeadpunk | thanks mgariepy! | 16:21 |
andrewbonney | Got there first :) | 16:21 |
noonedeadpunk | awesome, thanks | 16:21 |
jrosser | ahha it is done | 16:21 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 16:21 |
mgariepy | any comments on that ^^ | 16:22 |
noonedeadpunk | except you could use single release note?:) | 16:22 |
noonedeadpunk | as it's a yaml list | 16:22 |
noonedeadpunk | but whatever | 16:23 |
mgariepy | lol yes sure haha | 16:23 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 16:25 |
jrosser | haproxy_raw is interesting i think, even if we like a haproxy_http_request as well | 16:38 |
jrosser | as you can't use config_template with haproxy config it's difficult to insert arbitrary stuff into front/back end that the template does not support | 16:38 |
jrosser | the way that the security.txt patches were done was influenced a bit by what was allowed in the template | 16:39 |
mgariepy | my idea is just to be able to set config whenever the haproxy version supports it without the need to modifying the template. | 16:40 |
mgariepy | i;ll re-upload a soonish. | 16:40 |
mgariepy | just stuck in a meeting right now. | 16:41 |
*** cshen has joined #openstack-ansible | 17:09 | |
*** cshen has quit IRC | 17:13 | |
admin0 | jrosser, i think this has passed -- https://review.opendev.org/c/openstack/neutron/+/765408 .. will it be in the next update and if yes, when is that update coming ? | 17:17 |
*** cshen has joined #openstack-ansible | 17:36 | |
*** ericzolf has quit IRC | 17:40 | |
*** jbadiapa has quit IRC | 17:57 | |
*** rpittau is now known as rpittau|afk | 18:04 | |
jrosser | admin0: that will be in the next osa ussuri tag, not sure when as that can be only after we fix all this centos8.3 mess | 18:05 |
admin0 | jrosser, in that case, i can override the neutron with this Change-Id: Icfcf8c5406cfdc47fabf012e82ed56c345a73af8 ? somewhere ? i don't recall the exact steps to do it | 18:10 |
*** miloa has quit IRC | 18:24 | |
*** CeeMac has joined #openstack-ansible | 18:34 | |
jrosser | admin0: here are the instructions https://docs.openstack.org/openstack-ansible/latest/user/source-overrides/index.html | 18:41 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 18:46 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 18:50 |
mgariepy | ouf. the workflow in the new gerrit is something lo. | 18:53 |
mgariepy | lol. | 18:53 |
mgariepy | let me know if you have other comemnts. | 18:54 |
mgariepy | noonedeadpunk, i would prefer to keep the haproxy_frontend_raw input instead of adding a haproxy_http_request as it will be easier to change the output with this key. | 18:55 |
mgariepy | the output of the template that is. | 18:55 |
SecOpsNinja | yep for soem reason the certbot.service installed by haproxy_server role is getting error 124 in pre hook | 18:55 |
SecOpsNinja | trying to understand what is failing... | 18:55 |
ThiagoCMC | New CentOS: https://rockylinux.org ? lol | 18:56 |
mgariepy | ThiagoCMC, centos.rip is wayy better ;p ahah | 18:56 |
SecOpsNinja | ThiagoCMC, are you taling about this https://arstechnica.com/gadgets/2020/12/centos-shifts-from-red-hat-unbranded-to-red-hat-beta/ ? | 18:57 |
mgariepy | noonedeadpunk, if you want to add stuff like : https://www.haproxy.com/blog/four-examples-of-haproxy-rate-limiting/ | 18:57 |
ThiagoCMC | mgariepy, LOL | 18:57 |
ThiagoCMC | SecOpsNinja, pretty much | 18:58 |
SecOpsNinja | yep i stoped using centos a long time ago and now using debian stable our sometimes ubuntu | 18:59 |
ThiagoCMC | Me too... Using Debian since 1998 and Ubuntu since 2006 (desktops) | 19:00 |
ThiagoCMC | Never liked RH-based distros, way too complicated, specially.. Hmm... Upgrades. | 19:00 |
mgariepy | the issue with centos i have is that if you need vim you need an extra repo, and another one for nano.. that's just annoying. | 19:01 |
ThiagoCMC | Exactly. And unsafe. | 19:01 |
ThiagoCMC | Debian repo is huge, everything is in there | 19:01 |
ThiagoCMC | supported, stable, tested... | 19:01 |
mgariepy | i have been biten once or twice with major pkg update in epel. | 19:02 |
mgariepy | it was not fun. | 19:02 |
spatel | admin0: did you verify this patch - https://review.opendev.org/c/openstack/neutron/+/765408 | 19:03 |
ThiagoCMC | Saw that happening too, in previous jobs... People fear upgrades because they stick with CentOS... Only if they knew Debian lol | 19:03 |
ThiagoCMC | spatel, I'm applying that patch manually in my cloud. Totally required for me! | 19:04 |
spatel | ThiagoCMC: let me know if it stop spitting logs. | 19:04 |
spatel | it looks very ugly | 19:04 |
ThiagoCMC | Yes, logs are clean now | 19:05 |
spatel | This patch should be on high priority to merge, not sure why its not taking enough traction | 19:05 |
jrosser | SecOpsNinja: paste the certbot errors if you think it might hepl | 19:06 |
spatel | ThiagoCMC: did you edit files to apply patch or use OSA way to push out branch/commit? | 19:06 |
ThiagoCMC | On launchpad, people say that it only affects CentOS but it affects Ubuntu as well. I sent a message there too. | 19:06 |
jrosser | the pre-hook should be runnable by hand i think to test it | 19:06 |
ThiagoCMC | spatel, `vim` FTW | 19:06 |
spatel | :) | 19:06 |
SecOpsNinja | yep runing manualy the pre scirpt it runs ok it semas something in certbot trying to get any info on that comand but i will past | 19:07 |
jrosser | when you run the pre-hook you should also be able to see the backend become active with hatop or in the haproxy journal | 19:07 |
SecOpsNinja | parte do the renew in verbose mode http://paste.openstack.org/show/800949/ | 19:08 |
jrosser | is this on ubuntu or debian? | 19:08 |
SecOpsNinja | its debian and its running fine the haproxy parte because i can see in all backends starting the service and became up... | 19:09 |
SecOpsNinja | debian 10 i believe | 19:09 |
SecOpsNinja | and i have installed using distro | 19:10 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 19:10 |
SecOpsNinja | i will try removing the pre hook and starting manualy and see if the hook still fails | 19:10 |
kleini | How do you handle vm.overcommit_memory? I still have default of 0, OpenStack thinks, 420G out of 512G are used, actually free are 158G and qemu gets "Cannot allocate memory" when probing for cababilities, so I can not spawn any new VMs on that machine although memory still seems to be available. | 19:11 |
SecOpsNinja | yep the certbot is having a strange error http://paste.openstack.org/show/800951/ | 19:13 |
jrosser | that is odd | 19:15 |
jrosser | which log are the 404 from? | 19:15 |
kleini | ThiagoCMC: I have one system, that I initially installed with Debian 1.3 (Bo) and upgraded it all the way along up to Buster. Underlying hardware needs to be replaced several times and filesystem evolved from ext2, ext3, reiserfs, ext4 now finally to ZFS. But I never had to "reinstall" it. | 19:15 |
admin0 | spatel, i am trying to override neutron to use that commit-id and then do a os-neutron run to validate that patch | 19:16 |
admin0 | or you mean do it manually ? | 19:16 |
admin0 | like edit the file (by hand) :D | 19:16 |
admin0 | jrosser, spatel - this way right ? https://gist.github.com/a1git/79d019baa855f7ef8d9ba0b47166ba62 | 19:19 |
admin0 | i need to put that in openstack_services.yml file | 19:19 |
spatel | via commit-id | 19:19 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 19:20 |
spatel | if you verify that process then let me know i will also push out | 19:20 |
spatel | admin0: looks good to me https://gist.github.com/a1git/79d019baa855f7ef8d9ba0b47166ba62 | 19:20 |
spatel | you can put that in user_variables.yml also | 19:21 |
admin0 | oh | 19:21 |
admin0 | :) | 19:21 |
admin0 | did not knew that | 19:21 |
jrosser | SecOpsNinja: "The exit code is 124. This is the value timeout uses to indicate the program was terminated using SIGTERM" | 19:21 |
spatel | admin0: check this out - https://docs.openstack.org/openstack-ansible-os_neutron/latest/ | 19:21 |
SecOpsNinja | yep the problem it seams that the problem could be in the weebroot in the default certbot renew.... checking there documentation about that | 19:22 |
spatel | it has all those variables which you can overwrite using user_variables.yml file | 19:22 |
jrosser | SecOpsNinja: the pre-hook uses "timeout" which may terminate the script like that | 19:23 |
admin0 | spatel, what do i run after that / the whole setup-everything ? | 19:24 |
admin0 | i think hosts is not needed | 19:24 |
admin0 | only infra and os-neutron ? | 19:24 |
SecOpsNinja | jrosser, yep i raise the timeout but it seams the problem is not that atm (despite causing the erro 124) but runing pre hook separalty from cert boott the problem it seems a problem of webroot files http://paste.openstack.org/show/800951/ | 19:24 |
jrosser | it's not webroot | 19:25 |
jrosser | it uses the certbot built in web server | 19:25 |
ThiagoCMC | kleini, riiight?! Try to upgrade CentOS 6 to 7! lol | 19:25 |
spatel | just run neutron playbook | 19:25 |
spatel | admin0: os-neutron-install.yml playbook is enough | 19:26 |
spatel | It will detect new branch and re-build / re-install neutron | 19:27 |
SecOpsNinja | jrosser, sorry? from what i understand the pre-hook only starts a web server, trought python, and only after that certbot will create the the acme challange files to be exposed trough webserver and that is not happing atm | 19:27 |
spatel | jrosser: is that correct? | 19:28 |
jrosser | spatel: yes I think it’s basically like a minor upgrade | 19:29 |
spatel | +1 | 19:33 |
*** maharg101 has quit IRC | 19:33 | |
spatel | Technically we only this patch on LinuxBridge agent right not for neutron-server. | 19:38 |
spatel | But good to keep everything consistent across the board | 19:39 |
jrosser | SecOpsNinja: certbot is run in standalone mode https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/tasks/haproxy_ssl_letsencrypt.yml#L72 | 19:39 |
jrosser | in that mode it has an internal web server which serves the challenge response, we are not using webroot | 19:40 |
jrosser | becasue on the haproxy node there is no web server | 19:40 |
jrosser | haproxy also cannot serve static content | 19:40 |
jrosser | we must be able to support deployments where haproxy has dedicated nodes, so this must be self contained | 19:40 |
jrosser | the pre-hook is there to give haproxy health checks sufficient time to detect which haproxy node is running certbot, by there being a valid http server on port 8888 | 19:41 |
SecOpsNinja | yep i understand that but dont understand standalone workflow because atm the reposes for that files are getting 404 from the python webserver and that is the cause for this not to renew (not thinking about the problem of the error 124 atm) | 19:42 |
jrosser | i did ask which log the 404 was from..... | 19:42 |
admin0 | spatel, WARNING: Did not find branch or tag 'Icfcf8c5406cfdc47.. | 19:43 |
jrosser | SecOpsNinja: the python webserver should run for 5 seconds before certbot | 19:43 |
jrosser | it is never expected to handle the challenge | 19:44 |
SecOpsNinja | ok i do need to check the standalone work but i will revert the change but yep i still dont understand why this is faling | 19:45 |
spatel | hmm | 19:45 |
SecOpsNinja | going to read certbot documentation beucase im not understand why this is faling and i reached the prod produced an unexpected error: urn:ietf:params:acme:error:rateLimited | 19:50 |
spatel | admin0: BRB | 19:50 |
jrosser | SecOpsNinja: whats kind of wierd is that the pre-hook is run much later than i would expect in your certbot log | 19:51 |
admin0 | jrosser,to get this, is it Icfcf8c5406cfdc47fabf012e82ed56c345a73af8 or 2207b885449667a7bc377f427b9123165223dbde as the neutron_git_install_branch ? | 19:51 |
admin0 | to get https://review.opendev.org/c/openstack/neutron/+/765408 | 19:51 |
admin0 | or is it not there yet due to the zuul status in the end | 19:51 |
jrosser | well neither of those things..... | 19:52 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 19:52 |
jrosser | you need the git SHA of the commit on stable/ussuri | 19:52 |
SecOpsNinja | jrosser, one second i will post tthe full log of certbot renew --standalone --staging --break-my-certs -vv in http://paste.openstack.org/show/800954/ | 19:53 |
admin0 | oh from here ? https://opendev.org/openstack/neutron/commits/branch/stable/ussuri -- i don't see it merged yet | 19:53 |
admin0 | last update is 3 weeks ago | 19:53 |
admin0 | so its passed review .. but yet to be merged to the actual branches | 19:53 |
admin0 | due to some CI error | 19:53 |
jrosser | admin0: the patch here is not merged at all https://review.opendev.org/c/openstack/neutron/+/765408 | 19:56 |
jrosser | it fails CI | 19:56 |
jrosser | SecOpsNinja: you will get a 503 from haproxy when it does not think there is a backend available to serve the request | 19:57 |
jrosser | in one terminal watch hatop/haproxy journal and in the other run the cert issuance | 19:58 |
jrosser | you should see the letsencrypt backend come up in the haproxy log, if you don't there is something wrong between haproxy and the detection of port 8888 being active | 19:59 |
SecOpsNinja | jrosser, sorry but now im lost. if we only use the python for 5 seconds before starting the service, and to fool haproxy that a web service is active, how is the certbonly standalone mode going to expose anything to outside requests? atm the certbot will put the certbot standalone service runing the chekk for haproxy is already triggering. i will put a bigger trigger for fail in the parrent | 20:00 |
SecOpsNinja | haproxy to see if it stays longer up and running | 20:00 |
jrosser | certbot standalone mode has an internal webserver inside certbot | 20:00 |
jrosser | certbot itself responds to the request | 20:01 |
mgariepy | reno is not fun. | 20:02 |
SecOpsNinja | ok but the parent haproxy (not haproxy in osa infra node) is still with the connection up and runing when certbot allready failed | 20:02 |
jrosser | i do not understand 'parent' | 20:03 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 20:03 |
*** andrewbonney has quit IRC | 20:05 | |
SecOpsNinja | jrosser, sorry i have 2 haproxy : one parrent (in the gateway of the company) and child one (installed in osa infra host). | 20:06 |
jrosser | oh this is new information! :) | 20:06 |
SecOpsNinja | from what im seing with the haproxy, in osa, the for the check is inter 12000 rise 1 fall 2 and in the parent one is that rise 1 and fall 10 with the default 1 sec | 20:07 |
jrosser | then i think careful attention to the pre-hook timeout being greater than the total of the two haproxy 'rise' times plus some for luck would be needed | 20:08 |
SecOpsNinja | now with the same setting in both haprioxy the channel is not open is time to triger the haproxy check | 20:08 |
SecOpsNinja | yep going to test that | 20:08 |
jrosser | two haproxy like this increases the uncertainty time i think | 20:08 |
SecOpsNinja | the parent one as a very big fail try so its not a problem (the connection goes up the python webserver starts and only goes down long aftert the certbot failed error) | 20:10 |
SecOpsNinja | going to check osa haproxy settings and try to tune it | 20:10 |
jrosser | if the timeout is not quite long enough you might find sometimes it works / sometimes fails depending at exactly what time the renewal runs vs. the haproxy checks | 20:10 |
jrosser | we had exactly this without the timeout at all and it was very unreliable | 20:10 |
jrosser | the timeout got added later to make it robust | 20:11 |
SecOpsNinja | ok i will make a few tests to see what is making the 503 | 20:11 |
mgariepy | should we abandon old patches (like 2yo one) ? | 20:16 |
SecOpsNinja | jrosser, yep there is some strange thing hapening in osa haproxy because hatop stops getting data when certbot is runing (the parent haproxy still has his connection open by the time certbot fails) | 20:17 |
jrosser | and you allow port 80 http through from the parent haproxy? | 20:18 |
SecOpsNinja | those broadcast message are anoying like hell when you try to foloow cli tools outputs heheh | 20:18 |
SecOpsNinja | yep | 20:18 |
jrosser | the other think you can do is have a terminal open on a host on the internet | 20:18 |
SecOpsNinja | i tested runing only the python webserver to check pout side comunication and it works | 20:18 |
jrosser | and you can curl the .well_known/acme-challenge endpoint without the hashed filename (because you don't know what it will be) | 20:19 |
jrosser | you can see that change from 503 to 404 and so on as the different phases of this happen | 20:19 |
SecOpsNinja | yep osa haproxy reload | 20:20 |
SecOpsNinja | let me see if i can find the logs regarding clets encrypt api | 20:21 |
SecOpsNinja | ok i think i understand the problem | 20:24 |
spatel | admin0: any luck | 20:25 |
admin0 | spatel, i had to do it manually | 20:27 |
admin0 | but that error is gone | 20:27 |
admin0 | i was able to remove and add and test like 10 loadbalancers create and delete | 20:27 |
spatel | manually=hand edit? | 20:27 |
admin0 | :( | 20:28 |
admin0 | spatel, https://opendev.org/openstack/neutron/commits/branch/stable/ussuri | 20:28 |
admin0 | nothing has landed in the last 3 weeks there | 20:28 |
admin0 | until that commit id lands there, we are out of luck to do it via any sort of automation | 20:28 |
admin0 | well, we can ansible copy :) | 20:28 |
SecOpsNinja | jrosser, for some reason the haproxy in osa is reloading after python webserver timeouts (5s) and in the time the request in parent haproxy get 503 because the child haproxy is reloading | 20:28 |
spatel | interesting admin0 | 20:29 |
jrosser | that is very odd | 20:29 |
SecOpsNinja | yep true :D | 20:29 |
jrosser | admin0: you can fork the repo on github (it's literally one click) then apply the patch yourself to stable/ussuri | 20:29 |
jrosser | point to your own github URL and your own git SHA | 20:30 |
jrosser | SecOpsNinja: really the only thing that should reload haproxy is the renewal hook | 20:30 |
*** avagi has quit IRC | 20:30 | |
SecOpsNinja | even if a backend fails it canont relaod all server in haproxy thats very stupid | 20:31 |
SecOpsNinja | and explains very strange errors that i was having in my multipe haproxy nodes.... | 20:31 |
jrosser | i am not sure how to determine what made it reload | 20:32 |
jrosser | systemctl status will not be helpul? | 20:32 |
SecOpsNinja | checking jjournal of haproxy is always reload after lets encrypt backend goes down | 20:33 |
jrosser | does it do that even if you run the pre-hook by hand? | 20:34 |
SecOpsNinja | i do need to troubleshoting this but at least we found the cause of the non validation :D | 20:34 |
admin0 | jrosser, i have not tried it yet .. the last time i applied a patch was for https ssl from cloudnull .. years ago :) | 20:34 |
admin0 | i think i did not made notes and already forgot | 20:34 |
SecOpsNinja | jrosser, i wil check that with only the python server | 20:35 |
admin0 | but copying files via ansible also seems OK to me .. they will be gone int he next ansible upgrade anyway | 20:35 |
spatel | jrosser: forking neutron github would be good idea | 20:35 |
admin0 | as it will be in a new venv | 20:35 |
admin0 | so no hassle of doing a fork and then patching and again touching the variables and remembering them | 20:35 |
admin0 | download 2 files, ansible copy to the compute, restart network-agents and you are good until you run the playbooks again/next upgrade | 20:36 |
admin0 | in my opinion | 20:36 |
SecOpsNinja | jrosser, nope. with only the python webserver the haproxy is not reloading so it can be the certbot standalone internal service ... very srange indead.... | 20:36 |
spatel | admin0: that is a good idea to just create new playbook call post-neutron-patch.yml and run along with other playbook to patch it until we have fix in branch | 20:37 |
jrosser | you could move the renewal hook aside so that cannot be run | 20:37 |
SecOpsNinja | jrosser, i will try that. and will report after going something to eat :D but again thanks for all your help troubleshooting this. i will report in 1h probably | 20:39 |
jrosser | sure not problem, it is getting late here too | 20:39 |
jrosser | really interested to know what is going on though :) | 20:39 |
*** rfolco has quit IRC | 21:34 | |
*** yann-kaelig has joined #openstack-ansible | 21:35 | |
admin0 | spatel, done magnum already ? | 21:51 |
admin0 | i failed .. | 21:51 |
admin0 | might do trove next before moving back to magnum again | 21:51 |
spatel | what is the issue with magum? | 21:52 |
spatel | I did magnum deployment on my lab but not in production | 21:52 |
spatel | I can't run it in production because we used vlan base provider and k8s doesn't fit in that design | 21:53 |
admin0 | the issue with magnum is that there is no issue in installation .... and the master (instance comes up) .. but its not marked up in the script and never reaches the point where the nodes are created and the magic happens | 22:11 |
*** avagi has joined #openstack-ansible | 22:18 | |
SecOpsNinja | jrosser, i finally was able to find the cause of all problems :D | 22:34 |
SecOpsNinja | first we dont need the pre hokk to start the python webserver because certbot standalone already does that | 22:34 |
SecOpsNinja | the reload was caused by post hook that reloads haproxy after concat the lets encrytp cert for haproxy | 22:35 |
SecOpsNinja | the problems in the renew was acording to /etc/letsencrypt/renewal/*.conf where the defined ip was diferente from the one expecting by haproxy | 22:36 |
jrosser | without the pre-hook the renewal is unreliable becasue of a race condition between haproxy healthchecks and certbot | 22:36 |
SecOpsNinja | when i did the certbot renew -vv --http-01-address 172.30.100.253 --http-01-port 8888 it renew it | 22:37 |
SecOpsNinja | i reduced the haproxy check to inter 1 rise 1 fall 10 and it workly fine | 22:38 |
SecOpsNinja | tomorow i will check why it put the wrong ip in the *conf file but atleast its renewed :D | 22:39 |
SecOpsNinja | second i need to find if haproxy needs a full reload when some cert changes | 22:40 |
jrosser | interesting | 22:40 |
jrosser | the reload is necessary iirc to pick up the new certificate | 22:40 |
SecOpsNinja | yep but at lest we need to change the cront job certbot to not reload certbot if the cert wasn't changed | 22:41 |
SecOpsNinja | In HAProxy 2.1 (Nov 2019), a new feature allows you to change TLS certificates without requiring a reload: https://www.haproxy.com/blog/dynamic-ssl-certificate-storage-in-haproxy/ | 22:42 |
SecOpsNinja | that could be interesting and to reload some unecesseries reload and drop out of connections | 22:42 |
jrosser | i wonder if we are using the right hook | 22:45 |
jrosser | renewal vs. deploy | 22:45 |
SecOpsNinja | i will put a 0 timeout in pre hook and see what hapens e try to troubleshooting the root csause regarding the worng ip in lets encrypt *.conf file | 22:46 |
SecOpsNinja | but now i can rest well knowing i solved the problem :) | 22:47 |
jrosser | perhaps this is wrong https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/tasks/haproxy_ssl_letsencrypt.yml#L99 | 22:48 |
jrosser | and the path should be /etc/letsencrypt/renewal-hooks/deploy/haproxy-renew instead | 22:48 |
SecOpsNinja | i need to check the diferences between both of them | 22:49 |
SecOpsNinja | https://github.com/certbot/certbot/issues/5935 | 22:50 |
jrosser | https://certbot.eff.org/docs/using.html?highlight=hook#renewing-certificates | 22:50 |
jrosser | that talks about all the different hook dirs | 22:50 |
SecOpsNinja | If you want your hook to run only after a successful renewal, use --deploy-hook in a command like this. | 22:51 |
SecOpsNinja | in https://certbot.eff.org/docs/using.html#renewing-certificates | 22:51 |
SecOpsNinja | When Certbot detects that a certificate is due for renewal, --pre-hook and --post-hook hooks run before and after each attempt to renew it. If you want your hook to run only after a successful renewal, use --deploy-hook in a command like this. | 22:52 |
SecOpsNinja | sioo yah we need to change it to deploy hook regaring the concat and reload | 22:52 |
jrosser | i think this is relevant | 22:52 |
jrosser | "You can also specify hooks by placing files in subdirectories of Certbot’s configuration directory. Assuming your configuration directory is /etc/letsencrypt, any executable files found in /etc/letsencrypt/renewal-hooks/pre, /etc/letsencrypt/renewal-hooks/deploy, and /etc/letsencrypt/renewal-hooks/post will be run as pre, deploy, and post hooks respectively when any certificate is renewed with the renew | 22:53 |
jrosser | subcommand" | 22:53 |
SecOpsNinja | and possibily add an option to newer versions of haproxy update dynamic the ssl wiuthout reload all the service | 22:53 |
jrosser | yes, that would end up being distro specific | 22:53 |
jrosser | are you running an haproxy version that can do that? | 22:54 |
SecOpsNinja | let me check that | 22:55 |
SecOpsNinja | because im using debian version im using the 1.8.19 but they already have the 2.2.6 in backports | 22:57 |
jrosser | ok, i can certainly look at making patch to fix the reload-when-not-renewing thing tomorrow | 22:58 |
jrosser | nice work finding that btw | 22:58 |
SecOpsNinja | it toook a day but the cause was found | 22:58 |
jrosser | :/ apologies, thanks for persisting with it though | 22:59 |
SecOpsNinja | i dont like to have my system to have strange behaviuours :D | 22:59 |
SecOpsNinja | and i was getting strange erros in haproxy disconeted so that is a problem because its used by opentack services so i had to try to find the cause :P | 23:00 |
jrosser | oh yes indeed | 23:00 |
jrosser | and if we can do reload-less new certificates with 2.x that will br great | 23:01 |
SecOpsNinja | yep that i will try to check it but will probavly be in the end of the year or in the january | 23:01 |
SecOpsNinja | i dont think i have time to check that but will be ion my todo list and a way to contribute to the project | 23:02 |
SecOpsNinja | ok i will go rest now but again thanks for all the help troubleshooting this | 23:02 |
*** SecOpsNinja has left #openstack-ansible | 23:10 | |
*** spatel has quit IRC | 23:17 | |
*** maharg101 has joined #openstack-ansible | 23:30 | |
*** maharg101 has quit IRC | 23:35 | |
*** rfolco has joined #openstack-ansible | 23:41 | |
*** yann-kaelig has quit IRC | 23:48 | |
*** tosky has quit IRC | 23:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!