15:00:06 <noonedeadpunk> #startmeeting openstack_ansible_meeting 15:00:06 <opendevmeet> Meeting started Tue Oct 25 15:00:06 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:06 <opendevmeet> The meeting name has been set to 'openstack_ansible_meeting' 15:00:11 <noonedeadpunk> #topic rollcall 15:00:28 <noonedeadpunk> o/ 15:02:50 <damiandabrowski> hi! 15:03:58 <jamesdenton> hi 15:04:59 <mgariepy> hey ! 15:05:10 <noonedeadpunk> #topic office hours 15:05:28 * noonedeadpunk checking if any new bugs are around actually 15:06:00 <noonedeadpunk> Yes, we have one 15:06:02 <noonedeadpunk> #link https://bugs.launchpad.net/openstack-ansible/+bug/1993575 15:06:20 <noonedeadpunk> But to be frank - I'm not sure overhead worth to fit this specific usecase 15:07:07 <noonedeadpunk> I do wonder if mentioned workaround would work and solve request, so we can document this better instead 15:07:37 <noonedeadpunk> Adri2000: would be great if you could try this out one day and return back to us if that solution of good enough for you 15:07:46 <noonedeadpunk> Ok, moving on. 15:08:14 <noonedeadpunk> I have some reflection about zookeeper that we agreed on PTG to deploy for cinder/designate/etc 15:09:02 <noonedeadpunk> So wha tI was using in one of openstack deployments was quite modified fork of https://opendev.org/windmill/ansible-role-zookeeper 15:09:50 <noonedeadpunk> I tried to merge my changes to it but maintainer was super sceptical about any changes to it as it should be super minimal and simple. But basically it even does not configure clustering properly 15:09:52 <johnsom> You are going with zookeeper over redis? 15:10:18 <noonedeadpunk> johnsom: well, we were doubting between etcd vs zookeeper 15:10:28 <johnsom> We have picked Redis as Octavia and some others need it too 15:10:39 <johnsom> Ah, yeah, Designate can't use etcd 15:10:52 <noonedeadpunk> johnsom: but I think you use tooz for coordination anyway? 15:10:54 <johnsom> tooz group membership doesn't work 15:11:19 <noonedeadpunk> as tooz says that `zookeeper is the reference implementation` 15:11:22 <johnsom> Yeah, it's through tooz 15:11:33 <johnsom> Yeah, zookeeper will work 15:11:37 <johnsom> etcd won't 15:11:42 <noonedeadpunk> Also iirc redis quite pita in terms of clustering? 15:11:50 <noonedeadpunk> well, cinder recommends etcd :D 15:12:19 <johnsom> What isn't a pita for clustering, lol 15:12:29 <noonedeadpunk> Well, zookeeper super simple 15:12:30 <johnsom> FYI: https://docs.openstack.org/tooz/latest/user/compatibility.html#grouping 15:12:47 <noonedeadpunk> yeah, zookeeper seems like most featurful thing? 15:13:25 <johnsom> Yeah, zookeeper is fine. I just wanted to mention that other tools are taking a different path. 15:14:31 <noonedeadpunk> just zookeeper worked out of the box without much hooks. Nasty thing about it is actually java thing I don't like.... 15:14:44 <noonedeadpunk> but other then that... 15:15:04 <noonedeadpunk> btw. In what scenarios coordination is required for octavia? 15:15:23 <johnsom> Octavia->Taskflow->Tooz 15:15:41 <johnsom> Taskflow is also using Redis key/value for the jobboard option 15:16:11 <johnsom> It's a new-ish requirement if you enable the jobboard capabiliity 15:16:29 <noonedeadpunk> and in this case redis is not utilized through tooz? 15:16:52 <damiandabrowski> it may be a dumb question but I see that tooz has mysql driver. So leveraging this may be the simplest option when we already have mysql in place. 15:16:54 <johnsom> It is through tooz, but also key/value store I believe 15:16:57 <damiandabrowski> but i assume there are some disadvantages? 15:17:20 <noonedeadpunk> Well zookeeper also allows key/store 15:17:29 <johnsom> Tooz mysql driver doesn't support group membership either 15:18:11 <noonedeadpunk> so I kind of trying to understand if redis only is required or it can work with any driver that has feature capability from tooz 15:18:48 <johnsom> I have not been deep in the jobboard work, it would probably be best to ask in the #openstack-octavia channel for clarity 15:19:23 <noonedeadpunk> aha, ok, will do 15:20:36 <noonedeadpunk> johnsom: seems it can be zookeeper :D https://docs.openstack.org/octavia/latest/configuration/configref.html#task_flow.jobboard_backend_driver 15:21:44 <noonedeadpunk> damiandabrowski: mysql was quite weird thing for coordination feature-wise 15:22:28 <noonedeadpunk> Eventually galera is async master/master. So weird things can arise 15:23:04 <noonedeadpunk> Also they say `Does not work when MySQL replicates from one server to another` which really sounds like things might go wrong 15:23:29 <johnsom> Yeah, mysql is not a good option, it's missing a lot of features. 15:24:02 <noonedeadpunk> but returning to my original pitch - I think we should create role from scratch instead of re-using pabelanger work.... 15:25:31 <damiandabrowski> thanks for an explanation 15:25:38 <noonedeadpunk> As for clustering his stance was that role deploys default template and you can get another role or post task to set template to correct one... 15:25:56 <noonedeadpunk> with lineinfile 15:26:06 <noonedeadpunk> (or smth like that 15:27:23 <noonedeadpunk> But I kind of feel bad about having multiple things under opendev umbrella that does almost same stuff 15:28:27 <noonedeadpunk> ok, moving next - glance multiple locations issue that we expose for ceph 15:28:39 <noonedeadpunk> damiandabrowski: do you want to share smth regarding it 15:30:09 <damiandabrowski> regarding zuul: poor you, i just noticed all your changes are abandoned :D https://review.opendev.org/q/project:+windmill/ansible-role-zookeeper+AND+owner:noonedeadpunk 15:30:55 <damiandabrowski> regarding glance multiple locations: we already merged 2 changes which should make things better. Later this week I'll try to contact glance guys and ask if it's really necessary to do something else 15:31:40 <damiandabrowski> ah sorry, only one patch was merged so far, it would be awesome to merge second one: https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/862171 15:32:21 <noonedeadpunk> yeah, I'm not sure about backporting, but I'm open for discussion, as it's not very strong opinion 15:33:34 <damiandabrowski> i don't have a strong opinion either. It's a security improvement but on the other hand we're changing default values... 15:39:15 <noonedeadpunk> I have also pushed some patches for ceph 15:39:42 <noonedeadpunk> So eventually this one https://review.opendev.org/c/openstack/openstack-ansible/+/862508 15:40:12 <noonedeadpunk> The question here - should we move also tempest/rally outside of setup-opestack to some setup-testsuites or smth? 15:40:48 <noonedeadpunk> or jsut abandon idea of moving ceph playbooks out of setup-openstack/infrastructure 15:40:53 <jamesdenton> might be nice to separate those, as i suspect they're really only used in CI 15:41:23 <noonedeadpunk> Well, not only in CI - we use tempest internally :D 15:41:30 <jamesdenton> we should be 15:41:42 <noonedeadpunk> but it does make sense to me to split them out as well. 15:42:06 <noonedeadpunk> ok then, if it's not only me I will propose smth 15:42:11 <noonedeadpunk> not sure about naming though 15:43:13 <noonedeadpunk> ah, btw, on PTG there was agreement that Y->AA upgrade should be done and tested on Ubuntu focal (20.04). So we should carry it until AA and drop only afterwards 15:45:30 <noonedeadpunk> I think that's all from my side 15:51:05 <damiandabrowski> honestly I'm not a fan of moving tempest/rally outside setup-openstack :/ it's about consistency, at the end of the day they are openstack services 15:53:07 <damiandabrowski> but considering that moving ceph out of setup-infrastructure brings another problems, maybe we should really think about implementing some variable like `integrated_ceph` which controls that? 15:53:14 <damiandabrowski> or if we already say in docs that integrated ceph is not recommended scenario - don't care about it at all 15:57:16 <noonedeadpunk> rally is not openstack service fwiw 15:57:55 <noonedeadpunk> sorry, I'm not really understanding purpose of integrated_ceph variable? 15:58:43 <noonedeadpunk> damiandabrowski: 15:58:44 <anskiy> noonedeadpunk: I believe, it's for doing it like this: https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-infrastructure.yml#L31-L33 15:59:14 * anskiy is actually a second user of ceph-ansible 15:59:15 <noonedeadpunk> yeah, but we control if ceph is being deployed or not by having appropriate group in inventory 15:59:53 <anskiy> from what I remember, this thing should be some kind of protection against accidental Ceph upgrade 16:00:02 <noonedeadpunk> if there's no ceph-mon group - you're safe 16:00:45 <anskiy> there is one for me :P 16:00:59 <noonedeadpunk> well... I mean. Then you should define it during runtime each time you want to execute any ceph action which is weird 16:01:25 <damiandabrowski> yeah, small correction: this variable shouldn't define if integrated ceph is used or not, but whether it needs to be a part of setup-infrastructure.yml 16:01:43 <noonedeadpunk> IMO that kind of brings more confusion or I'm not fully realize what behavior this variable should achieve 16:02:05 <damiandabrowski> but idk, as someone who uses integrated ceph in production envrionment, having ceph in setup-infrastructure.yml is not an issue at all :D 16:03:07 <mgariepy> i rarely use setup-infrastructure.yml playbook i prefer running each play one by one. 16:03:10 <damiandabrowski> executing the whole setup-infrastructure.yml or setup-openstack.yml on already running environment is not the safest thing anyway 16:03:38 <noonedeadpunk> folks, I'm actually quite open for suggestions. As you're right damiandabrowski - most tricky part are upgrades 16:04:04 <noonedeadpunk> and we do launch setup-infrastructure/setup-openstack in our run_upgrade.sh script 16:04:11 <noonedeadpunk> that will touch ceph when it should not 16:04:25 <ElDuderino> on that note, question - we run 'setup-infrastructure.yml' regularly, with success (still on rocky) and ran the upgrade scripts from pike to queens to rocky. 16:04:35 <ElDuderino> is there a 'better' way? 16:05:21 <noonedeadpunk> Tbh I don't think we should consider running setup-infrastructure.yml / setup-openstack as bad idea - these must be idempotent 16:05:47 <noonedeadpunk> ElDuderino: depending on better way for what :D 16:06:16 <anskiy> noonedeadpunk: I might miss some good point about danger that is running ceph playbook during upgrades, but: if it is part of OSA (and it clearly is -- there is some host groups defined in openstack_user_config for ceph), then what is the actual problem with that? 16:06:52 <ElDuderino> Noonedeadpunk: sorry, I didn't realize you were all still referring to the ceph bits. Disregard :/ 16:07:12 <mgariepy> yeah i dont run it not because i don't have confidence it won't work. it's just that this way i can control more easyly the time of each run. and split upgrades over a couple of days if i need. 16:07:47 <noonedeadpunk> anskiy: well, it's actually what we're trying to clarify - ceph-ansible is quite arguably a part - we intended to use it mostly for CI/AIO rather then production 16:08:09 <noonedeadpunk> anskiy: so we bump ceph-ansible version, but we don't really test upgrade path for ceph 16:08:26 <noonedeadpunk> so you might get some unintended ceph upgrade when upgrading osa 16:08:39 <noonedeadpunk> *upgrading openstack 16:08:58 <anskiy> well, it looks for me intended: as I've deployed this Ceph cluster via OSA... 16:09:50 <anskiy> I do believe there would be some about Ceph version being bumped in the release notes too, right? 16:10:12 <noonedeadpunk> yup, there will be 16:10:42 <noonedeadpunk> anskiy: so we're discussing patch https://review.opendev.org/c/openstack/openstack-ansible/+/862508 that actually adjusts doc in this way to say that while it's an option, we actually don't provide real support for it 16:11:37 <noonedeadpunk> also it is a bit tighten with uncertanty of future of ceph-ansible 16:13:04 <noonedeadpunk> ok, from what I got damiandabrowski votes to jsut abandon this patch 16:13:20 <anskiy> noonedeadpunk: yeah, I've read their README, but it just sounds too convinient for me: I only provide network configuration on my nodes -- everything else openstack-related is installed by OSA 16:14:42 <damiandabrowski> I completely agree that we should mention in docs that upgrading integrated ceph is not covered by our tests :D 16:15:38 <damiandabrowski> but i also agree with anskiy , when you're having ceph integrated with OSA, then upgrading it with setup-infrastructure.yml isn't really unintended 16:15:40 <noonedeadpunk> anskiy: it's hard to disagree. But like actions that you execute should be clear. And for me it's not always clear that setup-openstack will mess up with your rgw as well 16:15:45 <anskiy> noonedeadpunk: I think the patch is okay, if I could put something in user_variables that says "please_deploy_ceph_for_me_i_know_its_not_tested: true" 16:17:56 <noonedeadpunk> ok, then I think I need to sleep with it to realize value of such variable 16:18:25 <noonedeadpunk> Tbh, for me it would make more sense to have some variable like ceph_upgrade: true in ceph-ansible itself 16:18:42 <noonedeadpunk> (like we do for galera and rabbit) 16:19:09 <noonedeadpunk> that will really solve a lot of pain 16:19:48 <damiandabrowski> technically you have `upgrade_ceph_packages` 16:19:52 <damiandabrowski> https://github.com/ceph/ceph-ansible/blob/371592a8fb1896183aa1b55de9963f7b9a4d24f3/roles/ceph-defaults/defaults/main.yml#L115 16:20:18 <damiandabrowski> not sure if it fully solves the problem though 16:21:07 <noonedeadpunk> also, I think you're supposed to upgrade ceph-ansible with https://github.com/ceph/ceph-ansible/blob/371592a8fb1896183aa1b55de9963f7b9a4d24f3/infrastructure-playbooks/rolling_update.yml aren't you? 16:21:34 <noonedeadpunk> as I think our playbooks might be dump enough to upgrade it in non-rolling manner... 16:22:06 <noonedeadpunk> yes, we don't have "serial" in ceph playbooks as of today 16:22:35 <damiandabrowski> i never upgraded ceph with ceph-ansible but when I was modifying OSD configs etc., ceph-install.yml always restarted OSDs one by one 16:22:57 <anskiy> noonedeadpunk: so, the solution would be to apply https://review.opendev.org/c/openstack/openstack-ansible/+/862508 and add info to the upgrade docs to not forget to run https://github.com/ceph/ceph-ansible/blob/371592a8fb1896183aa1b55de9963f7b9a4d24f3/infrastructure-playbooks/rolling_update.yml if ceph-ansible version was bumped? 16:23:22 <noonedeadpunk> damiandabrowski: huh, interesting.... 16:23:30 <damiandabrowski> so I'm not sure what rolling_update.yml brings except few safety checks and overriding `upgrade_ceph_packages` value 16:24:34 <anskiy> damiandabrowski: so it's just like osa's upgrade doc, but in yaml :) 16:24:41 <anskiy> and for ceph 16:25:12 <noonedeadpunk> I think what he meant - you can just run ceph-isntall.yml -e upgrade_ceph_packages=true 16:25:20 <noonedeadpunk> and get kind of same result 16:27:37 <noonedeadpunk> so basically - no ceph upgrade will happen unless you provide -e upgrade_ceph_packages=true. 16:28:09 <damiandabrowski> that's at least what i think :D 16:28:54 <damiandabrowski> i also found out how ceph-ansible restarts OSDs one by one 16:29:03 <damiandabrowski> handlers are quite complex there and uses custom scripts 16:29:04 <damiandabrowski> https://github.com/ceph/ceph-ansible/tree/bb849a55861e3900362ec46e68a02754b2c892ec/roles/ceph-handler/tasks 16:29:18 <damiandabrowski> for ex. this one is responsible for restarting OSDs: https://github.com/ceph/ceph-ansible/blob/bb849a55861e3900362ec46e68a02754b2c892ec/roles/ceph-handler/templates/restart_osd_daemon.sh.j2 16:30:22 <noonedeadpunk> yup, already found that. They did quite extra mile to protect from stupid playbook 16:30:39 <noonedeadpunk> well, I mean. Then we can leave thing as is indeed.... 16:31:16 <noonedeadpunk> as basically ceph-ansible version bump or change of ceph_stable_release won't result in package upgrade unless you set `upgrade_ceph_packages` 16:32:58 <noonedeadpunk> I still like idea thought that ceph-rgw-install should not be part of setup-openstack as it has nothing to do with openstack... But can live with current state as well 16:33:05 <noonedeadpunk> oh, totally forgot 16:33:08 <noonedeadpunk> #endmeeting