15:01:23 #startmeeting devstack-cephadm 15:01:23 Meeting started Fri Jan 28 15:01:23 2022 UTC and is due to finish in 60 minutes. The chair is vkmc. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:23 The meeting name has been set to 'devstack_cephadm' 15:01:51 o/ 15:01:53 o/ 15:02:16 o/ 15:02:25 let's give people a couple of minutes to join 15:03:04 \o 15:03:50 o/ 15:04:24 o/ 15:04:38 #link https://etherpad.opendev.org/p/devstack-plugin-ceph 15:04:49 just set an etherpad to take notes, if needed 15:05:16 let's get started 15:05:32 #topic motivations 15:05:47 so, as mentioned in the mailing list thread 15:07:24 we (as in manila team) are aiming to implement a plugin for devstack to deploy a ceph cluster using the cephadm tool 15:07:34 this tool is developed and maintained by the ceph community 15:08:11 and it allows users to get specific ceph versions very easily and enforces good practices for ceph clusters 15:08:30 vkmc: yeah and it's a good idea as ceph-ansible is not used anymore (cephadm is the official deploy tool) 15:08:49 i would suggest privoting the curernt plugin instead of developing a new one 15:08:53 agree :) 15:09:07 and there's also the orchestrator managing the ceph containers, so it's not just a matter of deploy 15:09:08 so tha twe only have to test with one way of deploying ceph in the gate 15:09:13 because everyone else needs it, as this is the new way for deploying ceph 15:09:21 and we don't need to change all the jobs 15:09:27 all good reasons 15:09:35 seems we don't need to get more into detail on why we want to do this 15:09:45 +1 15:09:49 so, I guess, we are already covering this 15:09:51 but 15:10:06 #topic design approach 15:10:24 as sean-k-mooney mentions, instead of implementing two different plugins 15:10:36 i think for most porject how ceph is deployed is less relevent then ceph is deployed. we do care about version to a degree but not nessialy is it in a contiaenr or disto pacakage or upstream ceph package 15:10:41 we should work on devstack-ceph-plugin as we know it, allowing users to toggle between the two 15:10:50 +1 15:11:09 and, as tosky mentions, it will be easier as well from a ci standpoint 15:11:16 so we don't need to change the jobs definitions 15:11:32 so, do we need the switch? 15:11:44 i think until its stable yes 15:12:00 maybe, with time, we will need to come with a strategy to turn on the option to deploy with cephadm, start testing with that 15:12:12 i would sugess that for this cycle we likely dont want to enable cephadm by default untill we are happy it work for the majorty of the jobs 15:12:19 right 15:12:21 and slowly deprecate the current way of deploying things 15:12:26 agree 15:12:28 ya so similar to the ovn swtich 15:12:38 which leads me to 15:12:40 fpantano: so just to clarify: is ceph-ansible still useful for pacific? I'd say it is, as we use it 15:12:46 #topic initial patch 15:12:53 and devstack-plugin-ceph is branched now so old stable branches can continue as they do now 15:13:05 tosky: afaik it's useful only for upgrades purposes 15:13:06 #link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484 15:13:09 is there a specific mapping of what is compatible with what from the ceph, ceph-ansible and cephadm point of you? 15:13:17 tosky: we aren't really using ceph-ansible either with the old branches 15:13:29 the approach was even older 15:13:37 tbarron: ah, so ceph-ansible here is out of the picture already 15:13:42 oh 15:13:53 i think so, someone correct me if I'm wrong 15:14:12 tbarron: by "old branches" do you also include the current master, right? Now that I think about it, the code installs the packages and does some magic 15:14:36 leseb authored a lot of early ceph-ansible and early devstack-plugin-ceph but the latter was not really adapted to use ceph-ansible 15:14:39 even on master 15:14:54 yeah, we don't use ceph-ansible in the plugin 15:15:13 ceph-ansible is being used for other deployment tools, such as a tripleo 15:15:37 but in the devstack-plugin-ceph we have been installing packages and configuring things manually 15:15:49 ok, so is it correct to say that the current method used by devstack-plugin-ceph just works because it happened not to use ceph-ansible, but it's an handcrafted set of basic steps? 15:15:49 does cephadm still have a dep on docker/podman on the host. i have not used it in a while 15:16:03 sean-k-mooney, it does 15:16:05 (sorry for the questions, just to have the full current status on the logs) 15:16:12 tosky: +1 (but check me folks) 15:16:16 sean-k-mooney: yes, it has podman dep 15:16:22 ack 15:16:28 tosky++ 15:16:34 we shoudl have that avaible in the relevent distros i guess 15:16:40 but devstack wont set it up by default 15:16:44 right 15:16:56 so the plugin will have to check and enable it although will cephadm do that for us 15:17:10 the less we have to maintain the better 15:17:20 sean-k-mooney, https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484/8/devstack/lib/cephadm#100 15:17:24 agree 15:17:27 it should really be "install and run", at least it was on Debian when I switched it - you may just need to enable some parameters depending on the kernel version used, but that could be figured out I guess 15:17:56 ack 15:18:17 vkmc: im not sure how may other have this useacse but will you support multi node ceph deployment as part of this work 15:18:51 we at least will need to copy the keys/ceph config in the zuul jobs to multiple nodes 15:19:12 sean-k-mooney, if we are currently testing that in ci, we will need to see how to do such a deployment with cephadm in our ci, yes 15:19:17 but do you plan to supprot adding addtion nodes with osds via running devstack and the plugin on multiple nodes 15:19:45 sean-k-mooney: ack, the current review is standalone, but in theory, having 1 node cluster and ssh access to other nodes means you can enroll them as part of the cluster 15:19:47 sean-k-mooney: cephadm can do scale-out well 15:19:59 i think nova really only uses one ceph node(the contoler) and copies the keys/config and install the ceph client on the compute 15:20:34 sean-k-mooney: oh, you are talking multiple ceph clients rather than multi-node ceph cluster, right? 15:20:45 well potentally both 15:20:49 the former is used today 15:21:01 I don't see any issue either way. 15:21:07 the later might be nice for some glance/cinder/genade testing 15:21:37 sorry that is a bit off topic please continue 15:21:40 tbarron: me neither, let's have a plan (and maybe multiple reviews) to reach that status 15:21:54 fpantano: +1 15:22:05 fpantano++ 15:22:05 multiple ceph clients is probably the first step as it may be used in multinode jobs 15:22:29 s /may/will :) 15:22:49 +1 15:22:53 although i think we do that via roles in zull today 15:23:12 so it might not be fully in scope fo the plug but rather the job defitions/roles 15:23:24 and probably that part won't change, but as fpantano says, can look in future reviews 15:23:41 i belive how it work today it the first node complete then we just copy the data to the others 15:23:56 so I take this is an scenario needed by nova, right? 15:23:57 then stack on the remainder so that should be pretty independet 15:24:17 the ceph multinode job does live/colde migration testing 15:24:37 so both computes need to be able to connect to the singel ceph instance on the contoler 15:25:04 that could be step 2 howewver since we can use the legacy install mode for multinode for now 15:25:46 yeah, I'd say let's have a single node working, then multinode is the goal and the CI can be tested against this scenario 15:25:56 fpantano++ 15:27:47 ok, so... in theory, cephadm is a tool that can be used to deploy a single node cluster working, and then we can scale as needed 15:27:53 and test different scenarios 15:27:57 depending on what we need 15:28:21 I'm not familiar with all the different scenarios we need to test, I guess this varies between services 15:28:23 so that leads me to 15:28:43 shall we establish some SMEs for each of the services to work on this? 15:29:04 well one way to test is a DNM patch to the differnet service that flips the flag 15:29:16 if the jobs explode then we know there is work to do 15:29:41 zuul speculitive execution is nice that way 15:29:42 s/if/when/ 15:29:47 :) 15:30:15 all right :) 15:30:29 the jobs currently defind in devstack-plugin-ceph already sets up various services, I'd say having a cephadm version of them passing is step 1 15:31:31 +1 15:31:40 ++ /me not familiar with the current jobs but I agree, we can enable cephadm first and see how it goes and what we missed on the first place ^ 15:32:15 nova has 2 cpeh jobs but both inherit form devstack-plugin-ceph-tempest-py3 and devstack-plugin-ceph-multinode-tempest-py3 15:32:29 the WIP patch can switch all of them, and when it's stable we can simply duplicate them 15:33:13 sean-k-mooney: so if you depend on the DNM patch which use the new defaults you should see the results, and same for the jobs in cinder which inherits from those devstack-plugin-ceph jobs 15:33:25 yes 15:33:47 we can have one patch to nova that does that and we can recheck it and tweak until they are happy 15:34:19 so I see nova, cinder, manila folks here, but not the other consumer, glance. 15:34:32 * tbarron exempts rosmaita since he's cinder ptl 15:34:50 well also maybe ironic? 15:35:01 would be if I forgot them 15:35:06 do they supprot boot form ceph via the iscis gateway 15:35:20 sean-k-mooney: i dunno. they talk about it in ptg. 15:35:46 i know they have cidner boot supprot with iscsi but no idea if they test that with ceph and the iscis gateway 15:35:48 proably not 15:35:55 at east with devstack 15:36:23 nova's nova-ceph-multistore job 15:36:29 will test glance 15:36:30 * tbarron wonders if dansmith can be enticed to help if glance jobs blow up 15:36:37 so maybe that is ok for now 15:36:39 sean-k-mooney: a quick search with codesearch.openstack.org doesn't show any ironic repository 15:36:50 ack 15:37:06 https://github.com/openstack/nova/blob/master/.zuul.yaml#L467 15:37:08 * rosmaita is not paying any attention 15:37:19 he's bragging again 15:37:39 xD 15:38:19 ok so 15:38:22 so for those not familar with that job we confirue glacne with both the file and ceph backend and test direct usage of images backed by rbd ensuring we dont download them 15:38:55 so that will fail if glance does not work 15:39:01 with cephadm 15:39:37 +1 thanks for clarifying ^ 15:41:53 James Parker proposed openstack/whitebox-tempest-plugin master: [WIP] Add multiple hugepage size job https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/825011 15:42:40 which jobs do we want to see passing before getting the cephadm script merged? as a initial step I mean 15:43:07 my suggestion are the jobs in the repository itself 15:43:37 very well 15:43:52 so let's start from there 15:43:55 in fact there are: 15:44:12 (moment) 15:44:25 tbarron: sorry, just saw this, reading back 15:44:31 dansmith: ty 15:44:44 - devstack-plugin-ceph-tempest-py3 is the base baseline 15:44:58 - devstack-plugin-ceph-multinode-tempest-py3 its multinode version 15:45:29 - devstack-plugin-ceph-cephfs-native and devstack-plugin-ceph-cephfs-nfs are non voting, so up to the manila team whether they should be considered as blocking 15:45:41 - there is also devstack-plugin-ceph-tempest-fedora-latest which is voting 15:45:57 yep devstack-plugin-ceph-tempest-py3 is the MVP i think but that is really up to the plugin core team so just my 2 cents 15:46:05 - non voting: devstack-plugin-ceph-master-tempest (but that should be probably not too different from the basic tempest job at this point) 15:46:07 James Parker proposed openstack/whitebox-tempest-plugin master: [WIP] Add multiple hugepage size job https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/825011 15:46:29 - experimental: devstack-plugin-ceph-compute-local-ephemeral (nova? What is the status of that?) 15:46:41 tbarron: oh, I don't know much about ceph really, I just worked on that job because we really needed test coverage there 15:46:52 that's the current status of https://opendev.org/openstack/devstack-plugin-ceph/src/branch/master/.zuul.yaml as far as I can see 15:47:11 tbarron: so yeah I can help if that job blows up in general, but I'm still not quite sure what is being discussed.. some change coming to ceph? 15:47:18 dansmith, tbarron I will have a look if there is something wrong 15:47:26 abhishekk++ 15:47:59 dansmith: tl;dr right now devtack-plugin-ceph deploys ceph manually; it has never used the more blessed way (ceph-ansible), and there is a new official way (cephadm) 15:48:10 abhishekk: dansmith excellent. The idea is to change devstack-plugin-ceph to use cephadm to deploy the target ceph cluster 15:48:13 dansmith: we are discussing implementing the support for cephadm for the deployment 15:48:16 ah 15:48:45 in theory client side (e.g. glance) everything would "just work" 15:48:48 heh 15:48:57 well, I don't think that job does anything special during setup, it just wires nova and glance together to do the good stuff 15:49:03 dansmith: and a first review has been submitted: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484 15:49:17 so if you don't a-break-a the setup, I doubt it will a-break-a the testing :) 15:49:27 ++ 15:49:37 "special during *ceph* setup I meant" 15:51:07 yup, but it will be good to know whom to consult when the tests break anyways and it's not clear why 15:52:03 tbarron, you can ping me (I will be around 1800 UTC) 15:52:05 I can deflect with the best of them, so yeah.. count me in :) 15:52:53 dansmith++ 15:52:55 great :) 15:53:56 so abhishekk and dansmith for glance, tosky, rosmaita, eharney and enriquetaso for cinder, sean-k-mooney for nova, tbarron, gouthamr and vkmc for manila 15:54:05 i may or may not join this meeting every week but feel fee to ping me if needed 15:54:32 once we toggle jobs and if we see issues on specific jobs 15:54:45 yep 15:55:07 sean-k-mooney, good thing you mention this, we won't probably meet every week... maybe our next sync would be in one month from now, depending on how fast we get initial things merged 15:55:12 I'll continue to communicate on the mailing list 15:55:12 and once the first results are promising, we can trigger a few other component-specific jobs in the new mode 15:55:27 tosky++ sounds good 15:55:39 +1 we have a plan :D 15:55:43 +1 we do 15:56:14 vkmc: technically we want also to hear from QA people, as they are also core on that repository iirc 15:56:18 small bird told me carloss is joining us in the manila side as well :D 15:56:20 kopecmartin, gmann ^^ 15:56:27 :D 15:56:31 carloss++ 15:57:19 ok, 3 more minutes til top of the hour, so let's wrap here 15:57:21 and also because the change impacts jobs which votes on tempest.git itself 15:57:38 next action item is to get review for that initial patch and start enabling it for the different jobs we have in that repo 15:57:51 and we can sync up again in a few weeks 15:58:23 enable for testing, and when they are stable, start flipping them in a stable way? 15:58:23 anything else? 15:58:34 tosky, yes 15:59:10 perfect, thanks 15:59:24 great! 15:59:42 #endmeeting