noonedeadpunk | good morning | 07:26 |
---|---|---|
jrosser | o/ morning | 07:50 |
jrosser | looks like a missing dependancy here https://zuul.opendev.org/t/openstack/build/35dd297230ae4b7280a7780d9be80671 | 07:58 |
jrosser | distutils for centos10 | 07:58 |
jrosser | i was also thinking that making the "user supplied" choice an explicit backend in the pki role could make things simpler | 08:03 |
jrosser | as we add extra backends it is kind of odd that user supplied is some special case of "standalone" | 08:03 |
jrosser | when actually it would not use the standalone cert generation at all | 08:03 |
jrosser | that would allow the vars that define a cert to become more uniform and reduce "these vars are for backend X" "these vars are for backend Y" that we seem to be going towards | 08:04 |
noonedeadpunk | I think that distutils is provided by setuptools, right? | 08:10 |
noonedeadpunk | I actually can recall coverign it in some other molecule job | 08:11 |
jrosser | i think thats right yes | 08:11 |
noonedeadpunk | damiandabrowski: ^ | 08:16 |
damiandabrowski | hi! jrosser I'm not sure if I understand, can you prepare some example? | 08:18 |
frickler | distutils is provided by setuptools, but the latter is no longer automatically installed in a venv, you need to make that explicit | 08:18 |
jrosser | damiandabrowski: like here https://github.com/openstack/openstack-ansible-os_glance/blob/master/defaults/main.yml#L380 | 08:19 |
noonedeadpunk | ah | 08:20 |
noonedeadpunk | well, how would it be different from standalone then? | 08:20 |
jrosser | frickler: i think there is a good chance that this is actually missing in the system python, as the missing distutils error is happening inside execution of an ansible module on its target | 08:20 |
frickler | but that should already have been needed with noble, cf. https://review.opendev.org/c/openstack/kolla/+/907589 | 08:21 |
noonedeadpunk | frickler: yeah, we do it for sure in most cases, but missed some other in molecule it seems | 08:21 |
noonedeadpunk | or just docker images are slightly different in what they have for ubuntu/centos | 08:22 |
damiandabrowski | ahhh I see, yes that may be a good idea. | 08:22 |
damiandabrowski | If I understand you corectly, it will slightly reduce the "these vars are for backend X" "these vars are for backend Y" issue | 08:23 |
damiandabrowski | because we won't need a separate src (for standalone) and cert (for hashi_vault) parameter here, right? | 08:23 |
noonedeadpunk | but thinking about that, offloading this logic to the role might make sense indeed | 08:23 |
damiandabrowski | https://opendev.org/openstack/openstack-ansible-os_glance/src/commit/1f1dc604b1cf0c5b4c2cd8c502299399e6848b9a/defaults/main.yml#L405 | 08:23 |
jrosser | damiandabrowski: i was noticing in your patches that the name of the cert really is quite simple `myservice_{{ ansible_facts['hostname'] }}` | 08:23 |
jrosser | and theres no reason it cannot also be the same format for the standalone, except that it is expecting a path in order to also support user-supplied | 08:24 |
noonedeadpunk | we can just go with `name` instead thyen? | 08:24 |
noonedeadpunk | or well. | 08:24 |
jrosser | well something yes for sure, im not totally sure what is the best option | 08:25 |
noonedeadpunk | yeah, it can be name for standalone/vault and keep src for user-supplied? | 08:25 |
jrosser | "src" is pretty well understood from the copy module so that does make sense for the user-supplied | 08:25 |
noonedeadpunk | as we still need to pass a user-supplied path somehow, unless we restrain this to expected structure | 08:25 |
noonedeadpunk | and tell that in order to use user-supplied certs, you have to place them under expected strucutre | 08:26 |
damiandabrowski | but maybe we can use exactly the same parameter for all backends - standalone, user-supplied and hashi_vault? | 08:26 |
damiandabrowski | in this case, src would perfectly fit IMO | 08:26 |
damiandabrowski | ofc, for user-supplied backend, user will need to override the default value of src with the expected cert path | 08:27 |
jrosser | unless the use of `name` in that case means you *have* to put it in the expected directory | 08:28 |
jrosser | and `src` allows you to place it wherever you wish | 08:28 |
noonedeadpunk | yeah | 08:28 |
damiandabrowski | should we work on this before of after merging hashi_vault patches? | 08:29 |
damiandabrowski | or after* | 08:29 |
jrosser | anyway, i was just thinking that if we tidy up a few things (like noonedeadpunk did with the defaults/group vars typo) and also this, there might be quite some improvment for adding new backends cleanly | 08:29 |
jrosser | you did say that you had some trouble with overriding defaults and vars, i think we should address that first | 08:30 |
damiandabrowski | ouh, maybe I wasn't clear enough. I had troubles with this in other roles :D (for ex. ansible-hardening stores its variables in vars/ which is problematic) | 08:31 |
jrosser | well it's not problematic, so long as they're not supposed to be overridden :) | 08:31 |
jrosser | but yeah it needs care to make it correct | 08:31 |
jrosser | ahhah https://opendev.org/openstack/ansible-role-pki/commit/4e960a1083c71babc05a282a929f29d8f2f4df02 | 08:33 |
noonedeadpunk | oh, well :) | 08:34 |
noonedeadpunk | time to add it to redhat then | 08:34 |
opendevreview | Jonathan Rosser proposed openstack/ansible-role-pki master: Add python3-setuptools for redhat-10 based distros. https://review.opendev.org/c/openstack/ansible-role-pki/+/954213 | 08:35 |
damiandabrowski | fair point, yeah :D I will try to sort out the vars/ vs. defaults/ issue, but we will definitely need to have some backend-specific defaults in ansible-role-pki anyway | 08:37 |
damiandabrowski | and I wonder if it's okay to keep them in defaults/main.yml or move them to something like defaults/hashi_vault.yml and dynamically import this file when needed | 08:37 |
damiandabrowski | example: https://opendev.org/openstack/ansible-role-pki/src/commit/00545ffa46446372b0baf7fdb8a4b99e3eb5926a/defaults/main.yml#L205 | 08:38 |
jrosser | i think it is fine for them to be in defaults, so long as it is variables that are intended to be overridden | 08:39 |
jrosser | i would much rather we address this sort of thing https://opendev.org/openstack/ansible-role-pki/src/commit/00545ffa46446372b0baf7fdb8a4b99e3eb5926a/defaults/main.yml#L172-L180 | 08:39 |
damiandabrowski | okok, will do that | 08:43 |
damiandabrowski | I aim to apply improvements to my patches later this week | 08:43 |
noonedeadpunk | jrosser: btw on your comment here: https://review.opendev.org/c/openstack/openstack-ansible/+/953570 :) | 08:43 |
damiandabrowski | it would be nice to gather as much feedback as possible by then :D | 08:43 |
noonedeadpunk | the problem is, that despite squid is listening on *:3128, the problem is that it's not actually responding on management_address, as it's configured after service is started | 08:44 |
jrosser | oh becasue we change the order of setting up squid and creating the network? | 08:45 |
noonedeadpunk | but also, there's a race condition, that openstack_hosts fail before networks are configured, as they try to reach proxy to install systemd_networks | 08:45 |
noonedeadpunk | yeah | 08:45 |
noonedeadpunk | but the route I've added is /32 | 08:46 |
jrosser | right but that allows services to contact the external vip? | 08:46 |
noonedeadpunk | so it does not really give any escape path - jsut tells how to reach squid and do that not via lxcbr with nat, but via mgmt networek | 08:46 |
jrosser | unless i misunderstand...... | 08:47 |
noonedeadpunk | hm | 08:47 |
jrosser | the proxy scenario is kind of a two-for-the-price-of-one test | 08:47 |
jrosser | becasue it proves that everything goes via the proxy, or it will fail | 08:48 |
jrosser | and i think aslo it prevents misconfigured/broken services from directly using the external vip | 08:48 |
noonedeadpunk | well, they will go through proxy be default anyway then? | 08:49 |
jrosser | no, becasue it only sets deployment_environment_variables iirc | 08:49 |
noonedeadpunk | as public vip is in no_proxy anyway? | 08:49 |
jrosser | so there is no left-over proxy config left once the ansible has run | 08:49 |
noonedeadpunk | well, then this idea to offload to openstack_host sucks.... | 08:51 |
noonedeadpunk | as with proxy ansible/apt wants to go through it right away | 08:51 |
noonedeadpunk | not giving any chance to provision networks after bootstrap_aio is completed | 08:51 |
jrosser | well yes | 08:53 |
jrosser | in a real environment the proxy would be something that just exists before you start and you point to it | 08:53 |
jrosser | not something ever provisioned by openstack-ansible | 08:53 |
jrosser | similar case pretty much for step-ca? | 08:53 |
noonedeadpunk | I think proxy is a bit unique here | 08:55 |
jrosser | i think that theres two things going on here | 08:55 |
noonedeadpunk | as proxy is pre-requirement for `apt` to install packages | 08:55 |
jrosser | the setup of networks in openstack-hosts does have benefits for automating more of the OSA specific things | 08:55 |
noonedeadpunk | while step-ca is needed waaaay later | 08:55 |
jrosser | but the other case is "test fixtures" that we need which are somehow network related, and squid is the most early of these it seems | 08:56 |
noonedeadpunk | well, yes. As also on production you won't host squid on any of openstack hosts anyway | 09:02 |
noonedeadpunk | it should be somehow in different perimeter | 09:02 |
jrosser | well maybe thats what we do | 09:10 |
noonedeadpunk | ok, can we start a bit from the beginning here? Just trying to think if I should abandon this patch based on that or maybe not | 09:10 |
jrosser | we have have some other code that deploys test fixtures right at the start | 09:10 |
jrosser | and some other interface or whaever thats nothing to do with the OSA deploy | 09:10 |
jrosser | so it behaves just like it would in production | 09:10 |
noonedeadpunk | so like another bridge? | 09:11 |
noonedeadpunk | and add it to all containers? | 09:11 |
jrosser | and use the .102 IP perhaps so that theres no confusion with the VIP | 09:11 |
jrosser | oh hmm | 09:11 |
jrosser | i think it would be OK if the route you'd added was to something that was not the external VIP | 09:12 |
noonedeadpunk | so the issue is order of execution | 09:13 |
noonedeadpunk | we don't have this IP address up until systemd_networkd is restarted, but we already need it to install systemd_networkd by apt | 09:13 |
noonedeadpunk | and `bootstrap_host_public_address` is ansible_default_ip4_address | 09:14 |
noonedeadpunk | which we already have by default | 09:14 |
noonedeadpunk | and we can't add extra IP to the default interface, as it can be real IP and also restricted by firewall or allowed-address pairs | 09:15 |
noonedeadpunk | so I can't mess up with it | 09:15 |
noonedeadpunk | so the only 2 things I have to not startup squid on the public VIP, is either to provision a completely separate network in aio as we do today, or give up on idea of using openstack_hosts for network provisionment | 09:16 |
noonedeadpunk | (but also squid is listening on *:3128, so it's not even startup, but jsut ordering loophole) | 09:17 |
noonedeadpunk | I think that with current approach we do test proxy connection in quite a good way tbh. As dropping the route or doing smth wrong with squid would result in failures right away. | 09:18 |
noonedeadpunk | With that, we actually were not testing proxy connection to the public VIP anyway as we have this today: https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L322 | 09:19 |
jrosser | yes i agree it is pretty robust and catching errors that we would not find in other tests, even for non proxy deployments (like endpoint errors) | 09:19 |
noonedeadpunk | So services would walk to public IP without squid regardless | 09:19 |
noonedeadpunk | what the route changes, is that snat won't be used for reaching public vip | 09:21 |
noonedeadpunk | but we're keeping the interface we communicate over, as previously we were talking throug mgmt interface as well | 09:22 |
noonedeadpunk | but yeah, dunno | 09:22 |
noonedeadpunk | It's not I like proposal, I don't see good option to solve that | 09:22 |
jrosser | also i think i removed lxcbr0/eth0 from the containers entirely in this config | 09:25 |
jrosser | so it's not possible to nat via the host at all | 09:25 |
jrosser | https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L324-L326 | 09:27 |
noonedeadpunk | ah, yes, true | 09:28 |
noonedeadpunk | I think that's why I added route :D | 09:29 |
jrosser | i think i'm more confused rather than less now | 09:29 |
jrosser | but eth1 should be the mgmt subnet | 09:29 |
noonedeadpunk | and this is what was guaranteeing public ip is not reachable | 09:29 |
jrosser | oh ok ok public ip is the node ip isnt it | 09:29 |
noonedeadpunk | and I think it was not reachable at all, as containers were not having a default route | 09:29 |
noonedeadpunk | well | 09:30 |
jrosser | so i'm getting confused about .100 / .101 for sure | 09:30 |
noonedeadpunk | what we can do... is to have a different IP for proxy for containers and bare metal | 09:30 |
noonedeadpunk | .100 and .101 are both management network | 09:30 |
jrosser | yes one is the VIP and one is the bind address for services? | 09:31 |
noonedeadpunk | as for containers, we can keep having proxy on management network, but do it on public for bare metal | 09:31 |
noonedeadpunk | yes | 09:31 |
noonedeadpunk | but for public we don't have VIP | 09:31 |
noonedeadpunk | so yeah, I can do is_metal check and use either bootstrap_host_public_address or bootstrap_host_management_address depending on the choice | 09:32 |
noonedeadpunk | for http_proxy | 09:32 |
noonedeadpunk | and then we don't need route | 09:33 |
jrosser | so for the "early" things it would use the public address that squid is already bound to | 09:34 |
noonedeadpunk | yeah | 09:34 |
noonedeadpunk | for late as well, given they're running not in lxc though | 09:34 |
noonedeadpunk | but public vip is local bound, so you can't prohibit reaching it anyway | 09:35 |
noonedeadpunk | opr well | 09:35 |
noonedeadpunk | you can... | 09:35 |
noonedeadpunk | but you got it :) | 09:35 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Offload network provisionment for AIO to openstack_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/953570 | 09:38 |
noonedeadpunk | so smth like that I guess | 09:38 |
noonedeadpunk | oh, not really | 09:39 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Offload network provisionment for AIO to openstack_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/953570 | 09:40 |
noonedeadpunk | it could be fairt middle-ground.... | 09:41 |
noonedeadpunk | or indeed we should be having just a separate network/bridge for squid which we still do during bootstrap-aio | 09:42 |
noonedeadpunk | or abandon the patch :) | 09:42 |
opendevreview | Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239 | 12:43 |
jrosser | damiandabrowski: ^ maybe something like this? then for the vault stuff you can also use the `name` key | 12:43 |
jrosser | i think also this is backward compatible, and we could go through the roles and migrate everything to use `name` | 12:44 |
damiandabrowski | i liked the idea of a "user-provided" backend a bit more, because it would allow us to have only one variable instead of name and src, but this also looks good at first glance | 13:18 |
damiandabrowski | I'll have a deeper look tomorrow | 13:30 |
opendevreview | Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239 | 14:38 |
opendevreview | Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239 | 14:45 |
opendevreview | Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239 | 15:11 |
jrosser | damiandabrowski: even if we add a user-provided backend we have to support the vars that are in use today | 15:11 |
jrosser | even if its temporary whist migrating src -> name for most things | 15:12 |
jrosser | and one thing i'm not sure about is how we "enable" a user provided cert | 15:12 |
jrosser | as right now it's enough to just define `glance_user_ssl_cert` as the path to the file, and it will work | 15:13 |
jrosser | no messing with backend settings, or redefining the whole set of certs for glance just to set one of them to be user supplied | 15:13 |
jrosser | setting that var really implies that some 'user-supplied' functionality is used, however that is implemented | 15:14 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_glance master: Use 'name' to specify SSL certificates to the PKI role https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/954269 | 15:29 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_glance master: Use 'name' to specify SSL certificates to the PKI role https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/954269 | 15:51 |
damiandabrowski | yeah, you may be right about keeping src... | 16:22 |
jamesdenton_ | jrosser hello! looking back thru some old IRC logs... curious if you're using ASAP^2 in production with OVN | 17:26 |
jrosser | well | 17:27 |
jrosser | some time ago we (well andrewbonney actually) did a POC with it | 17:27 |
jrosser | and whilst it did basically work it was extremely fragile | 17:27 |
jrosser | i do have some notes here is there is anything in particular | 17:30 |
jamesdenton_ | Nothing in particular, no. Curious if the fragility was more on the driver side than say, neutron | 17:30 |
jamesdenton_ | Mainly curious to know what throughput you're seeing for GENEVE out of the box | 17:31 |
jamesdenton_ | and besides ASAP^2, are you doing anything special for offloading | 17:31 |
jamesdenton_ | i'm not crazy about DPDK, even aafter all this time | 17:31 |
jrosser | from what i can see in the test environment we were getting ~5Gbps out of the box between two VM on different hosts with the standard OVN setup | 17:37 |
*** jpw_alt is now known as jpw | 17:37 | |
jrosser | and i think we got that up to 18Gbps with the offloading as good as we could get at the time (~2 years ago) | 17:38 |
jrosser | i think that the fragility was in the composite of all-the-things that had to be just right at the same time | 17:39 |
jrosser | and all these things are pretty niche features, like vf-lag, the ovs offloading itself, and i think at the time offloaded security groups were a really new feature | 17:40 |
jrosser | and you need some pretty custom config of the NIC at boot, at every boot as well | 17:41 |
jamesdenton_ | that tracks pretty well. thank you | 17:43 |
jrosser | it would do 35Gbps between VM on the same host, and that was with 4x iperf threads all at ~85% CPU on the server side | 17:45 |
jrosser | so that might have been running out of CPU grunt at that point rather than network | 17:46 |
jamesdenton_ | yeah, exactly. We're seeing about 2.2Gbps between hypervisors on some Broadcom 25G NICs | 17:46 |
jamesdenton_ | and thats a single iPerf thread. I can scale that out horizontally but no one thread is > than 2Gbps or so | 17:47 |
jamesdenton_ | What's interesting is this offload page mentions Broadcom NICs as supporting offloading - though i'm not sure if the implementation is the same. https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html | 17:48 |
jrosser | looks like we also struggled to set encapsuation + vlan tagging at the same time on the what you'd call "uplink" port from OVS | 17:49 |
jrosser | for this test we ended up having to have the tunnel traffic untagged up to the switch | 17:49 |
jrosser | interesting - no detail for the broadcom nic though | 17:50 |
jamesdenton_ | that doc doesn't really call out the vtep itself, huh. an important missing detail :D | 17:51 |
jrosser | damiandabrowski: why dont we also implement ca_chain and fullchain for the standalone backend | 18:10 |
jrosser | seems like we leak implementation details of hashi vault out with the need for dual handling of string or list for `type` | 18:10 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!