15:00:06 <noonedeadpunk> #startmeeting openstack_ansible_meeting
15:00:06 <opendevmeet> Meeting started Tue Oct 25 15:00:06 2022 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:06 <opendevmeet> The meeting name has been set to 'openstack_ansible_meeting'
15:00:11 <noonedeadpunk> #topic rollcall
15:00:28 <noonedeadpunk> o/
15:02:50 <damiandabrowski> hi!
15:03:58 <jamesdenton> hi
15:04:59 <mgariepy> hey !
15:05:10 <noonedeadpunk> #topic office hours
15:05:28 * noonedeadpunk checking if any new bugs are around actually
15:06:00 <noonedeadpunk> Yes, we have one
15:06:02 <noonedeadpunk> #link https://bugs.launchpad.net/openstack-ansible/+bug/1993575
15:06:20 <noonedeadpunk> But to be frank - I'm not sure overhead worth to fit this specific usecase
15:07:07 <noonedeadpunk> I do wonder if mentioned workaround would work and solve request, so we can document this better instead
15:07:37 <noonedeadpunk> Adri2000: would be great if you could try this out one day and return back to us if that solution of good enough for you
15:07:46 <noonedeadpunk> Ok, moving on.
15:08:14 <noonedeadpunk> I have some reflection about zookeeper that we agreed on PTG to deploy for cinder/designate/etc
15:09:02 <noonedeadpunk> So wha tI was using in one of openstack deployments was quite modified fork of https://opendev.org/windmill/ansible-role-zookeeper
15:09:50 <noonedeadpunk> I tried to merge my changes to it but maintainer was super sceptical about any changes to it as it should be super minimal and simple. But basically it even does not configure clustering properly
15:09:52 <johnsom> You are going with zookeeper over redis?
15:10:18 <noonedeadpunk> johnsom: well, we were doubting between etcd vs zookeeper
15:10:28 <johnsom> We have picked Redis as Octavia and some others need it too
15:10:39 <johnsom> Ah, yeah, Designate  can't use etcd
15:10:52 <noonedeadpunk> johnsom: but I think you use tooz for coordination anyway?
15:10:54 <johnsom> tooz group membership doesn't work
15:11:19 <noonedeadpunk> as tooz says that `zookeeper is the reference implementation`
15:11:22 <johnsom> Yeah, it's through tooz
15:11:33 <johnsom> Yeah, zookeeper will work
15:11:37 <johnsom> etcd won't
15:11:42 <noonedeadpunk> Also iirc redis quite pita in terms of clustering?
15:11:50 <noonedeadpunk> well, cinder recommends etcd :D
15:12:19 <johnsom> What isn't a pita for clustering, lol
15:12:29 <noonedeadpunk> Well, zookeeper super simple
15:12:30 <johnsom> FYI: https://docs.openstack.org/tooz/latest/user/compatibility.html#grouping
15:12:47 <noonedeadpunk> yeah, zookeeper seems like most featurful thing?
15:13:25 <johnsom> Yeah, zookeeper is fine. I just wanted to mention that other tools are taking a different path.
15:14:31 <noonedeadpunk> just zookeeper worked out of the box without much hooks. Nasty thing about it is actually java thing I don't like....
15:14:44 <noonedeadpunk> but other then that...
15:15:04 <noonedeadpunk> btw. In what scenarios coordination is required for octavia?
15:15:23 <johnsom> Octavia->Taskflow->Tooz
15:15:41 <johnsom> Taskflow is also using Redis key/value for the jobboard option
15:16:11 <johnsom> It's a new-ish requirement if you enable the jobboard capabiliity
15:16:29 <noonedeadpunk> and in this case redis is not utilized through tooz?
15:16:52 <damiandabrowski> it may be a dumb question but I see that tooz has mysql driver. So leveraging this may be the simplest option when we already have mysql in place.
15:16:54 <johnsom> It is through tooz, but also key/value store I believe
15:16:57 <damiandabrowski> but i assume there are some disadvantages?
15:17:20 <noonedeadpunk> Well zookeeper also allows key/store
15:17:29 <johnsom> Tooz mysql driver doesn't support group membership either
15:18:11 <noonedeadpunk> so I kind of trying to understand if redis only is required or it can work with any driver that has feature capability from tooz
15:18:48 <johnsom> I have not been deep in the jobboard work, it would probably be best to ask in the #openstack-octavia channel for clarity
15:19:23 <noonedeadpunk> aha, ok, will do
15:20:36 <noonedeadpunk> johnsom: seems it can be zookeeper :D https://docs.openstack.org/octavia/latest/configuration/configref.html#task_flow.jobboard_backend_driver
15:21:44 <noonedeadpunk> damiandabrowski: mysql was quite weird thing for coordination feature-wise
15:22:28 <noonedeadpunk> Eventually galera is async master/master. So weird things can arise
15:23:04 <noonedeadpunk> Also they say `Does not work when MySQL replicates from one server to another` which really sounds like things might go wrong
15:23:29 <johnsom> Yeah, mysql is not a good option, it's missing a lot of features.
15:24:02 <noonedeadpunk> but returning to my original pitch - I think we should create role from scratch instead of re-using pabelanger work....
15:25:31 <damiandabrowski> thanks for an explanation
15:25:38 <noonedeadpunk> As for clustering his stance was that role deploys default template and you can get another role or post task to set template to correct one...
15:25:56 <noonedeadpunk> with lineinfile
15:26:06 <noonedeadpunk> (or smth like that
15:27:23 <noonedeadpunk> But I kind of feel bad about having multiple things under opendev umbrella that does almost same stuff
15:28:27 <noonedeadpunk> ok, moving next - glance multiple locations issue that we expose for ceph
15:28:39 <noonedeadpunk> damiandabrowski: do you want to share smth regarding it
15:30:09 <damiandabrowski> regarding zuul: poor you, i just noticed all your changes are abandoned :D https://review.opendev.org/q/project:+windmill/ansible-role-zookeeper+AND+owner:noonedeadpunk
15:30:55 <damiandabrowski> regarding glance multiple locations: we already merged 2 changes which should make things better. Later this week I'll try to contact glance guys and ask if it's really necessary to do something else
15:31:40 <damiandabrowski> ah sorry, only one patch was merged so far, it would be awesome to merge second one: https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/862171
15:32:21 <noonedeadpunk> yeah, I'm not sure about backporting, but I'm open for discussion, as it's not very strong opinion
15:33:34 <damiandabrowski> i don't have a strong opinion either. It's a security improvement but on the other hand we're changing default values...
15:39:15 <noonedeadpunk> I have also pushed some patches for ceph
15:39:42 <noonedeadpunk> So eventually this one https://review.opendev.org/c/openstack/openstack-ansible/+/862508
15:40:12 <noonedeadpunk> The question here - should we move also tempest/rally outside of setup-opestack to some setup-testsuites or smth?
15:40:48 <noonedeadpunk> or jsut abandon idea of moving ceph playbooks out of setup-openstack/infrastructure
15:40:53 <jamesdenton> might be nice to separate those, as i suspect they're really only used in CI
15:41:23 <noonedeadpunk> Well, not only in CI - we use tempest internally :D
15:41:30 <jamesdenton> we should be
15:41:42 <noonedeadpunk> but it does make sense to me to split them out as well.
15:42:06 <noonedeadpunk> ok then, if it's not only me I will propose smth
15:42:11 <noonedeadpunk> not sure about naming though
15:43:13 <noonedeadpunk> ah, btw, on PTG there was agreement that Y->AA upgrade should be done and tested on Ubuntu focal (20.04). So we should carry it until AA and drop only afterwards
15:45:30 <noonedeadpunk> I think that's all from my side
15:51:05 <damiandabrowski> honestly I'm not a fan of moving tempest/rally outside setup-openstack :/ it's about consistency, at the end of the day they are openstack services
15:53:07 <damiandabrowski> but considering that moving ceph out of setup-infrastructure brings another problems, maybe we should really think about implementing some variable like `integrated_ceph` which controls that?
15:53:14 <damiandabrowski> or if we already say in docs that integrated ceph is not recommended scenario - don't care about it at all
15:57:16 <noonedeadpunk> rally is not openstack service fwiw
15:57:55 <noonedeadpunk> sorry, I'm not really understanding purpose of integrated_ceph variable?
15:58:43 <noonedeadpunk> damiandabrowski:
15:58:44 <anskiy> noonedeadpunk: I believe, it's for doing it like this: https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-infrastructure.yml#L31-L33
15:59:14 * anskiy is actually a second user of ceph-ansible
15:59:15 <noonedeadpunk> yeah, but we control if ceph is being deployed or not by having appropriate group in inventory
15:59:53 <anskiy> from what I remember, this thing should be some kind of protection against accidental Ceph upgrade
16:00:02 <noonedeadpunk> if there's no ceph-mon group - you're safe
16:00:45 <anskiy> there is one for me :P
16:00:59 <noonedeadpunk> well... I mean. Then you should define it during runtime each time you want to execute any ceph action which is weird
16:01:25 <damiandabrowski> yeah, small correction: this variable shouldn't define if integrated ceph is used or not, but whether it needs to be a part of setup-infrastructure.yml
16:01:43 <noonedeadpunk> IMO that kind of brings more confusion or I'm not fully realize what behavior this variable should achieve
16:02:05 <damiandabrowski> but idk, as someone who uses integrated ceph in production envrionment, having ceph in setup-infrastructure.yml is not an issue at all :D
16:03:07 <mgariepy> i rarely use setup-infrastructure.yml playbook i prefer running each play one by one.
16:03:10 <damiandabrowski> executing the whole setup-infrastructure.yml or setup-openstack.yml on already running environment is not the safest thing anyway
16:03:38 <noonedeadpunk> folks, I'm actually quite open for suggestions. As you're right damiandabrowski - most tricky part are upgrades
16:04:04 <noonedeadpunk> and we do launch setup-infrastructure/setup-openstack in our run_upgrade.sh script
16:04:11 <noonedeadpunk> that will touch ceph when it should not
16:04:25 <ElDuderino> on that note, question - we run 'setup-infrastructure.yml' regularly, with success (still on rocky) and ran the upgrade scripts from pike to queens to rocky.
16:04:35 <ElDuderino> is there a 'better' way?
16:05:21 <noonedeadpunk> Tbh I don't think we should consider running setup-infrastructure.yml / setup-openstack as bad idea - these must be idempotent
16:05:47 <noonedeadpunk> ElDuderino: depending on better way for what :D
16:06:16 <anskiy> noonedeadpunk: I might miss some good point about danger that is running ceph playbook during upgrades, but: if it is part of OSA (and it clearly is -- there is some host groups defined in openstack_user_config for ceph), then what is the actual problem with that?
16:06:52 <ElDuderino> Noonedeadpunk: sorry, I didn't realize you were all still referring to the ceph bits. Disregard :/
16:07:12 <mgariepy> yeah i dont run it not because i don't have confidence it won't work. it's just that this way i can control more easyly the time of each run. and split upgrades over a couple of days if i need.
16:07:47 <noonedeadpunk> anskiy: well, it's actually what we're trying to clarify - ceph-ansible is quite arguably a part - we intended to use it mostly for CI/AIO rather then production
16:08:09 <noonedeadpunk> anskiy: so we bump ceph-ansible version, but we don't really test upgrade path for ceph
16:08:26 <noonedeadpunk> so you might get some unintended ceph upgrade when upgrading osa
16:08:39 <noonedeadpunk> *upgrading openstack
16:08:58 <anskiy> well, it looks for me intended: as I've deployed this Ceph cluster via OSA...
16:09:50 <anskiy> I do believe there would be some about Ceph version being bumped in the release notes too, right?
16:10:12 <noonedeadpunk> yup, there will be
16:10:42 <noonedeadpunk> anskiy: so we're discussing patch https://review.opendev.org/c/openstack/openstack-ansible/+/862508 that actually adjusts doc in this way to say that while it's an option, we actually don't provide real support for it
16:11:37 <noonedeadpunk> also it is a bit tighten with uncertanty of future of ceph-ansible
16:13:04 <noonedeadpunk> ok, from what I got damiandabrowski votes to jsut abandon this patch
16:13:20 <anskiy> noonedeadpunk: yeah, I've read their README, but it just sounds too convinient for me: I only provide network configuration on my nodes -- everything else openstack-related is installed by OSA
16:14:42 <damiandabrowski> I completely agree that we should mention in docs that upgrading integrated ceph is not covered by our tests :D
16:15:38 <damiandabrowski> but i also agree with anskiy , when you're having ceph integrated with OSA, then upgrading it with setup-infrastructure.yml isn't really unintended
16:15:40 <noonedeadpunk> anskiy: it's hard to disagree. But like actions that you execute should be clear. And for me it's not always clear that setup-openstack will mess up with your rgw as well
16:15:45 <anskiy> noonedeadpunk: I think the patch is okay, if I could put something in user_variables that says "please_deploy_ceph_for_me_i_know_its_not_tested: true"
16:17:56 <noonedeadpunk> ok, then I think I need to sleep with it to realize value of such variable
16:18:25 <noonedeadpunk> Tbh, for me it would make more sense to have some variable like ceph_upgrade: true in ceph-ansible itself
16:18:42 <noonedeadpunk> (like we do for galera and rabbit)
16:19:09 <noonedeadpunk> that will really solve a lot of pain
16:19:48 <damiandabrowski> technically you have `upgrade_ceph_packages`
16:19:52 <damiandabrowski> https://github.com/ceph/ceph-ansible/blob/371592a8fb1896183aa1b55de9963f7b9a4d24f3/roles/ceph-defaults/defaults/main.yml#L115
16:20:18 <damiandabrowski> not sure if it fully solves the problem though
16:21:07 <noonedeadpunk> also, I think you're supposed to upgrade ceph-ansible with https://github.com/ceph/ceph-ansible/blob/371592a8fb1896183aa1b55de9963f7b9a4d24f3/infrastructure-playbooks/rolling_update.yml aren't you?
16:21:34 <noonedeadpunk> as I think our playbooks might be dump enough to upgrade it in non-rolling manner...
16:22:06 <noonedeadpunk> yes, we don't have "serial" in ceph playbooks as of today
16:22:35 <damiandabrowski> i never upgraded ceph with ceph-ansible but when I was modifying OSD configs etc., ceph-install.yml always restarted OSDs one by one
16:22:57 <anskiy> noonedeadpunk: so, the solution would be to apply https://review.opendev.org/c/openstack/openstack-ansible/+/862508 and add info to the upgrade docs to not forget to run https://github.com/ceph/ceph-ansible/blob/371592a8fb1896183aa1b55de9963f7b9a4d24f3/infrastructure-playbooks/rolling_update.yml if ceph-ansible version was bumped?
16:23:22 <noonedeadpunk> damiandabrowski: huh, interesting....
16:23:30 <damiandabrowski> so I'm not sure what rolling_update.yml brings except few safety checks and overriding `upgrade_ceph_packages` value
16:24:34 <anskiy> damiandabrowski: so it's just like osa's upgrade doc, but in yaml :)
16:24:41 <anskiy> and for ceph
16:25:12 <noonedeadpunk> I think what he meant - you can just run ceph-isntall.yml -e upgrade_ceph_packages=true
16:25:20 <noonedeadpunk> and get kind of same result
16:27:37 <noonedeadpunk> so basically - no ceph upgrade will happen unless you provide -e upgrade_ceph_packages=true.
16:28:09 <damiandabrowski> that's at least what i think :D
16:28:54 <damiandabrowski> i also found out how ceph-ansible restarts OSDs one by one
16:29:03 <damiandabrowski> handlers are quite complex there and uses custom scripts
16:29:04 <damiandabrowski> https://github.com/ceph/ceph-ansible/tree/bb849a55861e3900362ec46e68a02754b2c892ec/roles/ceph-handler/tasks
16:29:18 <damiandabrowski> for ex. this one is responsible for restarting OSDs: https://github.com/ceph/ceph-ansible/blob/bb849a55861e3900362ec46e68a02754b2c892ec/roles/ceph-handler/templates/restart_osd_daemon.sh.j2
16:30:22 <noonedeadpunk> yup, already found that. They did quite extra mile to protect from stupid playbook
16:30:39 <noonedeadpunk> well, I mean. Then we can leave thing as is indeed....
16:31:16 <noonedeadpunk> as basically ceph-ansible version bump or change of ceph_stable_release won't result in package upgrade unless you set `upgrade_ceph_packages`
16:32:58 <noonedeadpunk> I still like idea thought that ceph-rgw-install should not be part of setup-openstack as it has nothing to do with openstack... But can live with current state as well
16:33:05 <noonedeadpunk> oh, totally forgot
16:33:08 <noonedeadpunk> #endmeeting