*** ultra is now known as Guest3688 | 04:08 | |
opendevreview | Damian Dąbrowski proposed openstack/openstack-ansible master: Add 'tls-transition' scenario https://review.opendev.org/c/openstack/openstack-ansible/+/885194 | 08:46 |
---|---|---|
opendevreview | Damian Dąbrowski proposed openstack/openstack-ansible stable/zed: Add support for 'tls-transition' scenario https://review.opendev.org/c/openstack/openstack-ansible/+/885196 | 08:46 |
kleini | I am currently upgrading to Yoga. setup-infrastructure did not fail in staging but in production: "Host controller2-repo-container-lalala is not in 'Peer in Cluster' state". I will read now through GlusterFS setup guide but maybe you have some faster hints. | 08:48 |
opendevreview | Damian Dąbrowski proposed openstack/openstack-ansible master: Enable TLS on haproxy VIPs and backends by default https://review.opendev.org/c/openstack/openstack-ansible/+/885192 | 08:48 |
kleini | Stumbled over https://bugzilla.redhat.com/show_bug.cgi?id=1051992 restarting glusterd resolved peers in status "Accepted peer request" | 08:58 |
jrosser | i wonder if it is a race somehow | 09:01 |
jrosser | like if we bring the gluster peers up all at once then it might end up in a strange state | 09:02 |
kleini | Now I have controller2 and controller3 as peers listed in controller2-repo-container-something and controller3-repo-container-something. Of course, "disconnected". | 09:05 |
kleini | before 2- and 3-repo-container had just 1-repo-container as single peer | 09:05 |
kleini | no, 2-repo-container has just 3-repo-container as single peer and vice versa. | 09:08 |
kleini | sorry, my first time getting in touch with glusterfs | 09:09 |
kleini | finally solved. I had two issues: 1. glusterd required a restart on all nodes to get transition from "Accepted peer request" to "Peer in Cluster" 2. nodes 2 and 3 had each other as peer, while 1 had 2 and 3 as peers. had to add 1 as peer on node 2, which automatically added node 1 as peer on node 3. | 09:24 |
jrosser | interesting | 09:27 |
jrosser | iirc the `infra` CI jobs start 3 repo containers to check this | 09:28 |
noonedeadpunk | yup, it does | 09:28 |
noonedeadpunk | but we don't check there like idempotency or anything like that for example | 09:29 |
noonedeadpunk | and on top of that, I guess environment wasn't brand new? | 09:29 |
kleini | maybe my second issue is caused by my first one. | 09:30 |
kleini | environment is old, initially deployed with S or T. second and third controller node have been added later. | 09:31 |
kleini | as my initial problem with peers in wrong states still does not seem to be resolved at all, would there be anything that could help to avoid such issues? | 09:41 |
kleini | anything in my logs now, that could help to avoid it? | 09:41 |
noonedeadpunk | To be frank - I'm not huge expert in gluster. We mount cephfs instead. As you can actually use any shared FS instesd of gluster, even things like s3fs or nfs. As at the end mount is configured using systemd-mount, and gluster installation can be disabled with a variable | 09:45 |
jrosser | yes, gluster is not an absolute requirement at all, it's just that something is needed to provide a shared filesystem | 09:50 |
jrosser | you can disable it and provide something else here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/repo_all.yml#L25-L32 | 09:51 |
jrosser | having said that, really it should work though | 09:52 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump ansible-core to 2.15.1 and collections https://review.opendev.org/c/openstack/openstack-ansible/+/886527 | 09:55 |
noonedeadpunk | yeah, it should for sure | 09:56 |
jrosser | maybe as simple as needing to run that in serial, but i don't really know | 10:00 |
opendevreview | Damian Dąbrowski proposed openstack/openstack-ansible master: Remove haproxy_accept_both_protocols from repo_all https://review.opendev.org/c/openstack/openstack-ansible/+/886586 | 10:00 |
kleini | will think about migrating that to a cephfs but then repo container need additional connection to Ceph storage network. | 10:05 |
jrosser | glusterfs has been reliable for us | 10:06 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Remove Ubuntu 20.04 support https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/886595 | 10:10 |
kleini | I expect it to be reliable, too, at least according to what I've heard. I just had now issues in production with setting it up initially. | 10:11 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Cleanup old OS support https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/886597 | 10:14 |
noonedeadpunk | switching to ansible-core 2.15 won't be trivial... Prtially because I did an unsupported thing with loop label lately :( | 10:33 |
jrosser | this was to prevent secret output (key) being in the ansible log? | 10:57 |
noonedeadpunk | well, more to suppress output and be more clear of what;s passed to the module instead of jsut what we're looping against | 11:04 |
noonedeadpunk | as loop item != what we pass to the module, so kinda weird | 11:04 |
jrosser | i think using default(omit) on the label is pretty suspect too | 11:09 |
jrosser | even if a mapping were allowed | 11:09 |
noonedeadpunk | I think I was adding default('omit') ? Which would jsut print out "omit" | 11:32 |
noonedeadpunk | At least that was intention | 11:32 |
noonedeadpunk | jrosser: have you ever was concerned about live migration speed? | 11:33 |
noonedeadpunk | As it seems that with enabled TLS for libvirtd it uses only single core for migration | 11:33 |
noonedeadpunk | while with disabled it utilkizes all. Which means that speed of migration is like a VM with disabled multiqueue | 11:34 |
noonedeadpunk | ~1.2gb | 11:34 |
jrosser | i dont recall us having seen an issue with that yet | 11:35 |
jrosser | andrewbonney: ^ ? | 11:35 |
jrosser | that is pretty sad though | 11:35 |
andrewbonney | I haven't seen it, but doesn't mean we don't have it | 11:35 |
andrewbonney | Our previous issues were all around using the wrong interfaces | 11:36 |
noonedeadpunk | Like https://listman.redhat.com/archives/libvirt-users/2018-May/msg00053.html | 11:36 |
jrosser | the second post there is talking about large volumes | 11:38 |
jrosser | andrewbonney: related to wrong interfaces, there are some patches regarding management address / ssh address which we need to go over | 11:38 |
noonedeadpunk | ah, well https://wiki.qemu.org/Features/Migration-Multiple-fds | 11:42 |
noonedeadpunk | what is fun though, is to see how encryption affects network throughput | 11:50 |
jrosser | similar https://bugzilla.redhat.com/show_bug.cgi?id=1968540 | 11:54 |
noonedeadpunk | as without tls enabled for live migrations (using plain tcp), I have like 20gbit/s vs 3gbit/s with enabled encryption | 11:54 |
noonedeadpunk | yup... | 11:56 |
noonedeadpunk | So it's a feature | 11:57 |
noonedeadpunk | Though now I'm very more sceptical about enabling internal tls by default | 11:57 |
noonedeadpunk | damiandabrowski: you might be interested in the topic as well | 11:57 |
jrosser | feels like that is a legitimate thing to talk to nova about, as it's not obvious that there is a big performance hit there | 12:00 |
noonedeadpunk | yeah, already pinged them as well. At least mentioning that in docs would be good I guess | 12:01 |
damiandabrowski | noonedeadpunk: but nova_qemu_vnc_tls is enabled by default already | 12:02 |
noonedeadpunk | but what is really nasty, that when you disable tls - you also can not do authentication | 12:03 |
noonedeadpunk | as it's done through mTLS | 12:03 |
damiandabrowski | i didn't want to mention vnc :D | 12:03 |
noonedeadpunk | damiandabrowski: well, what i meant, is that usage of any encryption that backed by gnutls, will get serious performance hit | 12:04 |
jrosser | isnt live migration an extreme case of that though? | 12:05 |
jrosser | normal API traffic will be spending much time doing $stuff in python anyway | 12:06 |
damiandabrowski | i can test that with rally if we have any doubts | 12:06 |
damiandabrowski | but i agree with jrosser | 12:06 |
jrosser | i am also not really familiar with the process model of uwsgi, if the tls is in a separate process/thread from the python parts | 12:08 |
noonedeadpunk | well, this feature for gnutls was merged for 3.7.3 which is exactly what you'd get in 22.04 | 12:11 |
noonedeadpunk | I should test in the sandbox | 12:11 |
*** mgoddard- is now known as mgoddard | 12:28 | |
NeilHanlon | yayyy centos is not longer publishing to git.centos.org | 12:57 |
* NeilHanlon begins crying | 12:57 | |
noonedeadpunk | ┻━┻︵ \(°□°)/ ︵ ┻━┻ | 13:18 |
NeilHanlon | (https://www.redhat.com/en/blog/furthering-evolution-centos-stream) | 13:22 |
noonedeadpunk | So now source code is locked for rhel customers only? | 13:29 |
noonedeadpunk | rly? | 13:29 |
NeilHanlon | pretty much | 13:30 |
NeilHanlon | what a fun wednesday | 13:30 |
noonedeadpunk | guess we should discuss marking CentOS as experimental at this point | 13:31 |
noonedeadpunk | But not really sure I realize what it means for Rocky? Not much I guess? | 13:32 |
NeilHanlon | for now it means we don't have updates... | 13:33 |
NeilHanlon | actively working on wtf we're going to do, though | 13:33 |
mgariepy | wow. | 13:34 |
noonedeadpunk | oh, wow | 13:35 |
mgariepy | rebase rocky on.. debian ? | 13:37 |
noonedeadpunk | lol | 13:37 |
noonedeadpunk | but that's not fun at all to be frank | 13:39 |
mgariepy | i know. | 13:39 |
noonedeadpunk | obvoisly that's a move against derivatives | 13:39 |
mgariepy | ibm is so evil. imo. | 13:41 |
spatel | what could be the issue for cinder volume stuck in detaching state ? | 13:44 |
spatel | I am able to create/attach but detach just stuck | 13:44 |
noonedeadpunk | does it get detached from nova point of view? | 13:45 |
noonedeadpunk | as attachment is stored both in cinder and nova databases | 13:46 |
spatel | checking nova logs | 13:47 |
noonedeadpunk | and depending on what command you use to detach - flows might be different. Or well, they could be different until latest os-brick OSSA vulnarability got covered | 13:47 |
spatel | noonedeadpunk this is the error I am getting on nova-compute.log - https://paste.opendev.org/show/bAbgQN0Qpf9do7pp0tMj/ | 13:51 |
noonedeadpunk | ask kolla ヽ(。_°)ノ | 13:52 |
spatel | haha, its my lab | 13:52 |
spatel | my production still running on openstack-ansible but some small environment using kolla.. :( | 13:53 |
noonedeadpunk | but eventually the latest ossa coverage got commands to detach volume issued to cinder invalid | 13:53 |
noonedeadpunk | and I think you should always use nova api to detach volumes since then | 13:54 |
spatel | Hmm! I am using horizon to detach. You are saying use CLI? | 13:54 |
spatel | everything was working fine until yoga but as soon as I upgrade to zed this issue encounter. | 13:57 |
spatel | I will open bug and see if its real issue or something else | 13:57 |
noonedeadpunk | Yeah, that's actually backported back to Yoga | 13:58 |
noonedeadpunk | https://security.openstack.org/ossa/OSSA-2023-003.html | 13:58 |
noonedeadpunk | and that's the release note covering your issue I beleive https://review.opendev.org/c/openstack/cinder/+/882835/2/releasenotes/notes/redirect-detach-nova-4b7b7902d7d182e0.yaml#20 | 14:00 |
noonedeadpunk | `cinder now rejects user attachment delete requests for attachments that are being used by nova instances to ensure that no leftover devices are produced on the compute nodes which could be used to access another project's volumes.` | 14:01 |
spatel | You are saying it required to use nova service token ? | 14:01 |
spatel | is this what you refer - https://docs.openstack.org/nova/latest/admin/configuration/service-user-token.html | 14:02 |
noonedeadpunk | I'm saying there used to be 2 api cals that allowed to detach volume - one to cinder and another to nova | 14:03 |
noonedeadpunk | from now own requests directly to cinder will fail | 14:03 |
noonedeadpunk | So you have this https://docs.openstack.org/api-ref/block-storage/v3/index.html#detach-volume-from-server | 14:04 |
noonedeadpunk | and you have that https://docs.openstack.org/api-ref/compute/#detach-a-volume-from-an-instance | 14:04 |
noonedeadpunk | and now first one can be called only by nova service and not by user | 14:05 |
noonedeadpunk | If I'm not mistaken and it's vice versa... | 14:06 |
spatel | Let me understand, You want me to use nova volume-detach command to detach volume? | 14:07 |
spatel | when you say nova api means what? | 14:07 |
spatel | Let me understand whole bug report first | 14:12 |
noonedeadpunk | or `openstack server remove volume` | 14:12 |
spatel | that didn't help :( | 14:14 |
spatel | I believe we need to configure something like this in cinder or nova - send_service_user_token = True | 14:14 |
spatel | because of security reason now cinder won't allow to detach volume without valid token from nova.. | 14:15 |
spatel | trying to understand where and how I should add those option in config | 14:15 |
spatel | https://bugs.launchpad.net/cinder/+bug/2004555/comments/75 | 14:16 |
noonedeadpunk | You can check how we did that :) | 14:18 |
noonedeadpunk | so you should define service_user https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/templates/cinder.conf.j2#L193-L203 | 14:19 |
spatel | oh.. let me splunk OSA code.. | 14:19 |
noonedeadpunk | and also use service token roles https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/templates/cinder.conf.j2#L177-L179 | 14:19 |
spatel | I should be adding them in NOVA and Cinder both place correct? | 14:19 |
noonedeadpunk | and glance I guess | 14:19 |
spatel | 3 roles? | 14:20 |
noonedeadpunk | Well, role is `service` for all of them | 14:24 |
admin1 | noonedeadpunk, from which tag in osa is the nova service token implemented ? | 14:30 |
admin1 | i see .. yoga and xena | 14:31 |
noonedeadpunk | it's not backported to xena :( | 14:42 |
noonedeadpunk | for yoga it's 25.4.0 | 14:43 |
noonedeadpunk | and minor upgrade could be quite breaking as well - I believe I wrote a release note to address that | 14:43 |
spatel | I hit this issue in my upgrade path so definitely worth keeping eyes | 14:44 |
anskiy | noonedeadpunk: that note only mentions major upgrades, if I understand correctly | 15:23 |
mgariepy | anyone here do multiple let'sencrypt domain/ips on a deployment ? | 15:23 |
anskiy | I'm planning an upgrade from 25.2.0 to 25.4.0, should I be concerned about that thing? | 15:24 |
* noonedeadpunk needs to check patch again | 15:24 | |
noonedeadpunk | anskiy: I think you're right, and minor upgrade will just cover vulnarability and resilently enable usage of service tokens | 15:26 |
noonedeadpunk | or well, relatively resiltently | 15:26 |
anskiy | okay, thank you :) | 15:27 |
noonedeadpunk | as problem was arising, when you already require service roles, but users were not assigned the role on the first place | 15:27 |
noonedeadpunk | When upgrading to Yoga, you would get role assigned, but it was not forced yet. And with this minor upgrade it will be forced | 15:28 |
spatel | noonedeadpunk it works now after adding service_user snippet :) | 16:16 |
spatel | Thank to point out that | 16:16 |
admin1 | mgariepy, use case ? .. i usually use a wildcard | 18:22 |
mgariepy | having 2 different ips on the ctrl, one for api the other object | 18:24 |
mgariepy | admin1, ^^ | 18:25 |
admin1 | i have use SAN for those, but not used letsencrypt | 18:33 |
jrosser | mgariepy: I think you can supply extra args to certbot through the haproxy role vars | 18:52 |
jrosser | with more ‘-d {{ fqdn}}’ as you need them | 18:52 |
mgariepy | jrosser, yep i think i found the correct stuff | 18:53 |
mgariepy | i want to have 1 cert per ip/domain tho. | 18:53 |
jrosser | yeah I’m not so sure we can do that just now | 18:53 |
mgariepy | will need to do some stuff but it should work. | 18:53 |
mgariepy | i'll see and patch as needed i guess. | 18:53 |
jrosser | we are looking at enabling s3 static sites which needs another ip/dns on the same haproxy | 18:53 |
mgariepy | the keepalived part is kinda simple. | 18:54 |
mgariepy | with keepalived_instances_overrides | 18:54 |
jrosser | unclear if it is possible to have two haproxy front ends with different LE setups | 18:54 |
jrosser | or if it’s ok to share the same very with a SAN | 18:55 |
jrosser | *same cert | 18:55 |
mgariepy | i'll dig a bit, might need some custom haproxy front/back i guess.. | 18:59 |
mgariepy | hmm yeah i guess it would need some adjustment.. | 19:24 |
mgariepy | jrosser, i guess we would need to refactor part of the code. here. | 19:56 |
mgariepy | the haproxy is probably the simplest part since it can be bound to a specific ip for a Front | 19:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!