15:01:23 <vkmc> #startmeeting devstack-cephadm
15:01:23 <opendevmeet> Meeting started Fri Jan 28 15:01:23 2022 UTC and is due to finish in 60 minutes.  The chair is vkmc. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:23 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:23 <opendevmeet> The meeting name has been set to 'devstack_cephadm'
15:01:51 <fmount> o/
15:01:53 <tosky> o/
15:02:16 <vkmc> o/
15:02:25 <vkmc> let's give people a couple of minutes to join
15:03:04 <enriquetaso> \o
15:03:50 <rosmaita> o/
15:04:24 <carloss> o/
15:04:38 <vkmc> #link https://etherpad.opendev.org/p/devstack-plugin-ceph
15:04:49 <vkmc> just set an etherpad to take notes, if needed
15:05:16 <vkmc> let's get started
15:05:32 <vkmc> #topic motivations
15:05:47 <vkmc> so, as mentioned in the mailing list thread
15:07:24 <vkmc> we (as in manila team) are aiming to implement a plugin for devstack to deploy a ceph cluster using the cephadm tool
15:07:34 <vkmc> this tool is developed and maintained by the ceph community
15:08:11 <vkmc> and it allows users to get specific ceph versions very easily and enforces good practices for ceph clusters
15:08:30 <fpantano> vkmc: yeah and it's a good idea as ceph-ansible is not used anymore (cephadm is the official deploy tool)
15:08:49 <sean-k-mooney> i would suggest privoting the curernt plugin instead of developing a new one
15:08:53 <vkmc> agree :)
15:09:07 <fpantano> and there's also the orchestrator managing the ceph containers, so it's not just a matter of deploy
15:09:08 <sean-k-mooney> so tha twe only have to test with one way of deploying ceph in the gate
15:09:13 <tosky> because everyone else needs it, as this is the new way for deploying ceph
15:09:21 <tosky> and we don't need to change all the jobs
15:09:27 <vkmc> all good reasons
15:09:35 <vkmc> seems we don't need to get more into detail on why we want to do this
15:09:45 <fpantano> +1
15:09:49 <vkmc> so, I guess, we are already covering this
15:09:51 <vkmc> but
15:10:06 <vkmc> #topic design approach
15:10:24 <vkmc> as sean-k-mooney mentions, instead of implementing two different plugins
15:10:36 <sean-k-mooney> i think for most porject how ceph is deployed is less relevent then ceph is deployed. we do care about version to a degree but not nessialy is it in a contiaenr or disto pacakage or upstream ceph package
15:10:41 <vkmc> we should work on devstack-ceph-plugin as we know it, allowing users to toggle between the two
15:10:50 <sean-k-mooney> +1
15:11:09 <vkmc> and, as tosky mentions, it will be easier as well from a ci standpoint
15:11:16 <vkmc> so we don't need to change the jobs definitions
15:11:32 <tosky> so, do we need the switch?
15:11:44 <sean-k-mooney> i think until its stable yes
15:12:00 <vkmc> maybe, with time, we will need to come with a strategy to turn on the option to deploy with cephadm, start testing with that
15:12:12 <sean-k-mooney> i would sugess that for this cycle we likely dont want to enable cephadm by default untill we are happy it work for the majorty of the jobs
15:12:19 <tosky> right
15:12:21 <vkmc> and slowly deprecate the current way of deploying things
15:12:26 <fpantano> agree
15:12:28 <sean-k-mooney> ya so similar to the ovn swtich
15:12:38 <vkmc> which leads me to
15:12:40 <tosky> fpantano: so just to clarify: is ceph-ansible still useful for pacific? I'd say it is, as we use it
15:12:46 <vkmc> #topic initial patch
15:12:53 <tbarron> and devstack-plugin-ceph is branched now so old stable branches can continue as they do now
15:13:05 <fpantano> tosky: afaik it's useful only for upgrades purposes
15:13:06 <vkmc> #link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484
15:13:09 <tosky> is there a specific mapping of what is compatible with what from the ceph, ceph-ansible and cephadm point of you?
15:13:17 <tbarron> tosky: we aren't really using ceph-ansible either with the old branches
15:13:29 <tbarron> the approach was even older
15:13:37 <fpantano> tbarron: ah, so ceph-ansible here is out of the picture already
15:13:42 <tosky> oh
15:13:53 <tbarron> i think so, someone correct me if I'm wrong
15:14:12 <tosky> tbarron: by "old branches" do you also include the current master, right? Now that I think about it, the code installs the packages and does some magic
15:14:36 <tbarron> leseb authored a lot of early ceph-ansible and early devstack-plugin-ceph but the latter was not really adapted to use ceph-ansible
15:14:39 <tbarron> even on master
15:14:54 <vkmc> yeah, we don't use ceph-ansible in the plugin
15:15:13 <vkmc> ceph-ansible is being used for other deployment tools, such as a tripleo
15:15:37 <vkmc> but in the devstack-plugin-ceph we have been installing packages and configuring things manually
15:15:49 <tosky> ok, so is it correct to say that the current method used by devstack-plugin-ceph just works because it happened not to use ceph-ansible, but it's an handcrafted set of basic steps?
15:15:49 <sean-k-mooney> does cephadm still have a dep on docker/podman on the host. i have not used it in a while
15:16:03 <vkmc> sean-k-mooney, it does
15:16:05 <tosky> (sorry for the questions, just to have the full current status on the logs)
15:16:12 <tbarron> tosky: +1 (but check me folks)
15:16:16 <fpantano> sean-k-mooney: yes, it has podman dep
15:16:22 <sean-k-mooney> ack
15:16:28 <vkmc> tosky++
15:16:34 <sean-k-mooney> we shoudl have that avaible in the relevent distros i guess
15:16:40 <sean-k-mooney> but devstack wont set it up by default
15:16:44 <vkmc> right
15:16:56 <sean-k-mooney> so the plugin will have to check and enable it although will cephadm do that for us
15:17:10 <sean-k-mooney> the less we have to maintain the better
15:17:20 <vkmc> sean-k-mooney, https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484/8/devstack/lib/cephadm#100
15:17:24 <vkmc> agree
15:17:27 <tosky> it should really be "install and run", at least it was on Debian when I switched it - you may just need to enable some parameters depending on the kernel version used, but that could be figured out I guess
15:17:56 <sean-k-mooney> ack
15:18:17 <sean-k-mooney> vkmc: im not sure how may other have this useacse but will you support multi node ceph deployment as part of this work
15:18:51 <sean-k-mooney> we at least will need to copy the keys/ceph config in the zuul jobs to multiple nodes
15:19:12 <vkmc> sean-k-mooney, if we are currently testing that in ci, we will need to see how to do such a deployment with cephadm in our ci, yes
15:19:17 <sean-k-mooney> but do you plan to supprot adding addtion nodes with osds via running devstack and the plugin on multiple nodes
15:19:45 <fpantano> sean-k-mooney: ack, the current review is standalone, but in theory, having 1 node cluster and ssh access to other nodes means you can enroll them as part of the cluster
15:19:47 <tbarron> sean-k-mooney: cephadm can do scale-out well
15:19:59 <sean-k-mooney> i think nova really only uses one ceph node(the contoler) and copies the keys/config and install the ceph client on the compute
15:20:34 <tbarron> sean-k-mooney: oh, you are talking multiple ceph clients rather than multi-node ceph cluster, right?
15:20:45 <sean-k-mooney> well potentally both
15:20:49 <sean-k-mooney> the former is used today
15:21:01 <tbarron> I don't see any issue either way.
15:21:07 <sean-k-mooney> the later might be nice for some glance/cinder/genade testing
15:21:37 <sean-k-mooney> sorry that is a bit off topic please continue
15:21:40 <fpantano> tbarron: me neither, let's have a plan (and maybe multiple reviews) to reach that status
15:21:54 <tbarron> fpantano: +1
15:22:05 <vkmc> fpantano++
15:22:05 <tosky> multiple ceph clients is probably the first step as it may be used in multinode jobs
15:22:29 <sean-k-mooney> s /may/will :)
15:22:49 <fpantano> +1
15:22:53 <sean-k-mooney> although i think we do that via roles in zull today
15:23:12 <sean-k-mooney> so it might not be fully in scope fo the plug but rather the job defitions/roles
15:23:24 <tbarron> and probably that part won't change, but as fpantano says, can look in future reviews
15:23:41 <sean-k-mooney> i belive how it work today it the first node complete then we just copy the data to the others
15:23:56 <vkmc> so I take this is an scenario needed by nova, right?
15:23:57 <sean-k-mooney> then stack on the remainder so that should be pretty independet
15:24:17 <sean-k-mooney> the ceph multinode job does live/colde migration testing
15:24:37 <sean-k-mooney> so both computes need to be able to connect to the singel ceph instance on the contoler
15:25:04 <sean-k-mooney> that could be step 2 howewver since we can use the legacy install mode for multinode for now
15:25:46 <fpantano> yeah, I'd say let's have a single node working, then multinode is the goal and the CI can be tested against this scenario
15:25:56 <vkmc> fpantano++
15:27:47 <vkmc> ok, so... in theory, cephadm is a tool that can be used to deploy a single node cluster working, and then we can scale as needed
15:27:53 <vkmc> and test different scenarios
15:27:57 <vkmc> depending on what we need
15:28:21 <vkmc> I'm not familiar with all the different scenarios we need to test, I guess this varies between services
15:28:23 <vkmc> so that leads me to
15:28:43 <vkmc> shall we establish some SMEs for each of the services to work on this?
15:29:04 <sean-k-mooney> well one way to test is a DNM patch to the differnet service that flips the flag
15:29:16 <sean-k-mooney> if the jobs explode then we know there is work to do
15:29:41 <sean-k-mooney> zuul speculitive execution is nice that way
15:29:42 <tbarron> s/if/when/
15:29:47 <sean-k-mooney> :)
15:30:15 <vkmc> all right :)
15:30:29 <tosky> the jobs currently defind in devstack-plugin-ceph already sets up various services, I'd say having a cephadm version of them passing is step 1
15:31:31 <tbarron> +1
15:31:40 <fpantano> ++ /me not familiar with the current jobs but I agree, we can enable cephadm first and see how it goes and what we missed on the first place ^
15:32:15 <sean-k-mooney> nova has 2 cpeh jobs but both inherit form  devstack-plugin-ceph-tempest-py3 and devstack-plugin-ceph-multinode-tempest-py3
15:32:29 <tosky> the WIP patch can switch all of them, and when it's stable we can simply duplicate them
15:33:13 <tosky> sean-k-mooney: so if you depend on the DNM patch which use the new defaults you should see the results, and same for the jobs in cinder which inherits from those devstack-plugin-ceph jobs
15:33:25 <sean-k-mooney> yes
15:33:47 <sean-k-mooney> we can have one patch to nova that does that and we can recheck it and tweak until they are happy
15:34:19 <tbarron> so I see nova, cinder, manila folks here, but not the other consumer, glance.
15:34:32 * tbarron exempts rosmaita since he's cinder ptl
15:34:50 <sean-k-mooney> well also maybe ironic?
15:35:01 <tbarron> would be if I forgot them
15:35:06 <sean-k-mooney> do they supprot boot form ceph via the iscis gateway
15:35:20 <tbarron> sean-k-mooney: i dunno.  they talk about it in ptg.
15:35:46 <sean-k-mooney> i know they have cidner boot supprot with iscsi but no idea if they test that with ceph and the iscis gateway
15:35:48 <sean-k-mooney> proably not
15:35:55 <sean-k-mooney> at east with devstack
15:36:23 <sean-k-mooney> nova's  nova-ceph-multistore job
15:36:29 <sean-k-mooney> will test glance
15:36:30 * tbarron wonders if dansmith can be enticed to help if glance jobs blow up
15:36:37 <sean-k-mooney> so maybe that is ok for now
15:36:39 <tosky> sean-k-mooney: a quick search with codesearch.openstack.org doesn't show any ironic repository
15:36:50 <sean-k-mooney> ack
15:37:06 <sean-k-mooney> https://github.com/openstack/nova/blob/master/.zuul.yaml#L467
15:37:08 * rosmaita is not paying any attention
15:37:19 <tbarron> he's bragging again
15:37:39 <vkmc> xD
15:38:19 <vkmc> ok so
15:38:22 <sean-k-mooney> so for those not familar with that job we confirue glacne with both the file and ceph backend and test direct usage of images backed by rbd ensuring we dont download them
15:38:55 <sean-k-mooney> so that will fail if glance does not work
15:39:01 <sean-k-mooney> with cephadm
15:39:37 <fpantano> +1 thanks for clarifying ^
15:41:53 <opendevreview> James Parker proposed openstack/whitebox-tempest-plugin master: [WIP] Add multiple hugepage size job  https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/825011
15:42:40 <vkmc> which jobs do we want to see passing before getting the cephadm script merged? as a initial step I mean
15:43:07 <tosky> my suggestion are the jobs in the repository itself
15:43:37 <vkmc> very well
15:43:52 <vkmc> so let's start from there
15:43:55 <tosky> in fact there are:
15:44:12 <tosky> (moment)
15:44:25 <dansmith> tbarron: sorry, just saw this, reading back
15:44:31 <tbarron> dansmith: ty
15:44:44 <tosky> - devstack-plugin-ceph-tempest-py3 is the base baseline
15:44:58 <tosky> - devstack-plugin-ceph-multinode-tempest-py3 its multinode version
15:45:29 <tosky> - devstack-plugin-ceph-cephfs-native and devstack-plugin-ceph-cephfs-nfs are non voting, so up to the manila team whether they should be considered as blocking
15:45:41 <tosky> - there is also devstack-plugin-ceph-tempest-fedora-latest which is voting
15:45:57 <sean-k-mooney> yep devstack-plugin-ceph-tempest-py3 is the MVP i think but that is really up to the plugin core team so just my 2 cents
15:46:05 <tosky> - non voting: devstack-plugin-ceph-master-tempest (but that should be probably not too different from the basic tempest job at this point)
15:46:07 <opendevreview> James Parker proposed openstack/whitebox-tempest-plugin master: [WIP] Add multiple hugepage size job  https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/825011
15:46:29 <tosky> - experimental: devstack-plugin-ceph-compute-local-ephemeral (nova? What is the status of that?)
15:46:41 <dansmith> tbarron: oh, I don't know much about ceph really, I just worked on that job because we really needed test coverage there
15:46:52 <tosky> that's the current status of https://opendev.org/openstack/devstack-plugin-ceph/src/branch/master/.zuul.yaml as far as I can see
15:47:11 <dansmith> tbarron: so yeah I can help if that job blows up in general, but I'm still not quite sure what is being discussed.. some change coming to ceph?
15:47:18 <abhishekk> dansmith, tbarron I will have a look if there is something wrong
15:47:26 <vkmc> abhishekk++
15:47:59 <tosky> dansmith: tl;dr right now devtack-plugin-ceph deploys ceph manually; it has never used the more blessed way (ceph-ansible), and there is a new official way (cephadm)
15:48:10 <tbarron> abhishekk: dansmith excellent.  The idea is to change devstack-plugin-ceph to use cephadm to deploy the target ceph cluster
15:48:13 <tosky> dansmith: we are discussing implementing the support for cephadm for the deployment
15:48:16 <dansmith> ah
15:48:45 <tbarron> in theory client side (e.g. glance) everything would "just work"
15:48:48 <tbarron> heh
15:48:57 <dansmith> well, I don't think that job does anything special during setup, it just wires nova and glance together to do the good stuff
15:49:03 <fpantano> dansmith: and a first review has been submitted: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484
15:49:17 <dansmith> so if you don't a-break-a the setup, I doubt it will a-break-a the testing :)
15:49:27 <abhishekk> ++
15:49:37 <dansmith> "special during *ceph* setup I meant"
15:51:07 <tbarron> yup, but it will be good to know whom to consult when the tests break anyways and it's not clear why
15:52:03 <abhishekk> tbarron, you can ping me (I will be around 1800 UTC)
15:52:05 <dansmith> I can deflect with the best of them, so yeah.. count me in :)
15:52:53 <vkmc> dansmith++
15:52:55 <vkmc> great :)
15:53:56 <vkmc> so abhishekk and dansmith for glance, tosky, rosmaita, eharney and enriquetaso for cinder, sean-k-mooney for nova, tbarron, gouthamr and vkmc for manila
15:54:05 <sean-k-mooney> i may or may not join this meeting every week but feel fee to ping me if needed
15:54:32 <vkmc> once we toggle jobs and if we see issues on specific jobs
15:54:45 <tosky> yep
15:55:07 <vkmc> sean-k-mooney, good thing you mention this, we won't probably meet every week... maybe our next sync would be in one month from now, depending on how fast we get initial things merged
15:55:12 <vkmc> I'll continue to communicate on the mailing list
15:55:12 <tosky> and once the first results are promising, we can trigger a few other component-specific jobs in the new mode
15:55:27 <vkmc> tosky++ sounds good
15:55:39 <fpantano> +1 we have a plan :D
15:55:43 <vkmc> +1 we do
15:56:14 <tosky> vkmc: technically we want also to hear from QA people, as they are also core on that repository iirc
15:56:18 <vkmc> small bird told me carloss is joining us in the manila side as well :D
15:56:20 <tosky> kopecmartin, gmann  ^^
15:56:27 <carloss> :D
15:56:31 <vkmc> carloss++
15:57:19 <vkmc> ok, 3 more minutes til top of the hour, so let's wrap here
15:57:21 <tosky> and also because the change impacts jobs which votes on tempest.git itself
15:57:38 <vkmc> next action item is to get review for that initial patch and start enabling it for the different jobs we have in that repo
15:57:51 <vkmc> and we can sync up again in a few weeks
15:58:23 <tosky> enable for testing, and when they are stable, start flipping them in a stable way?
15:58:23 <vkmc> anything else?
15:58:34 <vkmc> tosky, yes
15:59:10 <tosky> perfect, thanks
15:59:24 <vkmc> great!
15:59:42 <vkmc> #endmeeting