16:00:23 <jungleboyj> #startmeeting Cinder 16:00:23 <openstack> Meeting started Wed Jun 19 16:00:23 2019 UTC and is due to finish in 60 minutes. The chair is jungleboyj. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:26 <openstack> The meeting name has been set to 'cinder' 16:00:28 <whoami-rajat> Hi 16:00:32 <enriquetaso> hi 16:00:32 <smcginnis> o/ 16:00:38 <jungleboyj> courtesy ping: jungleboyj whoami-rajat rajinir lseki carloss pots woojay erlon geguileo eharney rosmaita enriquetaso e0ne smcginnis davidsha walshh_ xyang hemna _hemna 16:00:42 <jungleboyj> @! 16:00:43 <_pewp_> jungleboyj (◍˃̶ᗜ˂̶◍)ノ” 16:00:45 <geguileo> hi! o/ 16:00:46 <walshh_> hi 16:00:53 <lseki> hi 16:01:14 <carloss> hi 16:01:37 <woojay> hi 16:01:45 <jungleboyj> Hello everyone. 16:02:42 <jungleboyj> Give people one more moment to show up. 16:03:15 <e0ne> hi 16:03:29 <jungleboyj> Ok, now e0ne is here. We can start. :-) 16:03:38 <e0ne> :) 16:03:50 <e0ne> #link https://etherpad.openstack.org/p/cinder-train-meetings 16:03:59 <jungleboyj> e0ne: 16:04:00 * e0ne feels useful today :) 16:04:00 <jungleboyj> Thanks. 16:04:07 <jungleboyj> #announcements 16:04:14 <jungleboyj> #topic announcements 16:04:21 <jungleboyj> As you can see not a lot today. 16:04:44 <jungleboyj> Just the reminder that the requirement to have 3rd Party CI running on Py3 is coming up in about a month. 16:04:58 <jungleboyj> Will start doing an audit of systems soon. 16:05:20 <e0ne> jungleboyj, smcginnis: do you know how many 3rd party CI use python3? 16:05:30 <jungleboyj> e0ne: I haven't actually looked yet. 16:05:59 <smcginnis> e0ne: I've seen a couple at least, but I've been afraid to really go look at them all. 16:06:15 <jungleboyj> smcginnis: Same here. 16:06:54 <smcginnis> There are several driver maintainers here. Anyone have any progress or updates to report on this? 16:07:08 <jungleboyj> ++ 16:08:12 <smcginnis> Py37 is an officially supported runtime for train, so if driver maintainers don't make sure they work right with it, there could be a lot of pain in a few months when customers want to try out the new release. 16:08:12 <walshh_> dell_emc have moved from py36 to py37 16:08:23 <smcginnis> walshh_: Awesome - thanks! 16:09:22 <jungleboyj> That is good. 16:09:30 <jungleboyj> rajinir: Anything from your team? 16:10:40 <jungleboyj> Hmmm. 16:10:43 <jungleboyj> Anyone else? 16:10:48 <woojay> LINSTOR is 3.7 16:10:55 <jungleboyj> woojay: Woot woot! 16:11:01 <jungleboyj> Thanks. 16:11:29 <smcginnis> woojay: Nice! :) 16:12:27 * _erlon_ sneaks ... 16:12:40 <dviroel> e0ne: We (NetApp) are using python3 on our CI. 16:12:53 <e0ne> great! 16:13:11 <jungleboyj> Good. So, we aren't asking for something that doesn't work. 16:14:00 <jungleboyj> So, hopefully others have it working too and just aren't in the meeting. 16:14:07 <_erlon_> jungleboyj: no, it works, we just needed to set USE_PYTHON3=True in our jobs 16:14:43 <walshh_> I believe legacy Dell drivers are in the process of changing to py37 also. It may even be complete 16:15:12 <jungleboyj> Good. 16:15:29 <smcginnis> dviroel: 3.7? 16:15:56 <_erlon_> 3.6 or 3.7 not sure 16:16:11 <dviroel> smcginnis: need to check that 16:16:48 <smcginnis> OK, it needs to be 3.7, but if it's already using 3.6 then that shouldn't be as big of a hurdle. 16:16:53 <jungleboyj> Ok. Any further discussion needed on that then? 16:16:54 <_erlon_> smcginnis: 3.5 actually 16:17:04 <smcginnis> OK, so a little work to do there yet. 16:17:14 <_erlon_> I think DEVSTACK should allow you to set the version. We are just using the default 16:17:32 <smcginnis> It just depends on what version is available on your platform. 16:17:44 <smcginnis> Nothing controlled by devstack other that USE_PYTHON3=true 16:17:50 <_erlon_> smcginnis: hmm, jobs still running on xenial 16:18:32 <_erlon_> smcginnis: if 3.7 is not available there and we need to move to ubuntu 18 it will be a big lap 16:19:00 <smcginnis> You probably should move though. Xenial isn't a supported version anymore. 16:19:10 <whoami-rajat> i thought CI's moved to Bionic? 16:19:22 <smcginnis> jungleboyj: Guess this should have been a full topic and not just an announcement. :D 16:19:53 <_erlon_> smcginnis: jungleboyj: I know 3.7 is the recommended, but is that a requirement for Train? 16:19:55 <jungleboyj> :-) Well, it is good we are having the discussion. 16:20:10 <_erlon_> move to ubuntu 18 will take us some time 16:20:17 <smcginnis> _erlon_: Yes, that's what we've stated several times over the last few months. 16:20:18 <rajinir> walshh_, jungleboyj: sorry I'm late. All dell emc ci has moved to 3.7 16:20:29 <smcginnis> rajinir: Thanks! 16:20:34 <jungleboyj> rajinir: No problem. Thank you! 16:20:34 <_erlon_> whoami-rajat: if you move them its moved :P 16:20:41 <e0ne> #link https://governance.openstack.org/tc/reference/runtimes/train.html 16:21:06 <e0ne> ubuntu 18.04, python 3.5 and 3.7 should be supported 16:21:20 <smcginnis> 3.6 and 3.7, not 3.5 16:21:24 <_erlon_> smcginnis: damn, I just picked the python 3 part 16:21:39 <e0ne> smcginnis: sure, just a typo :( 16:22:09 <smcginnis> And can run on whatever you can install py3.7 on, but if you want to actually run what customers are being told is supported, that's Bionic, CentOs/RHEL7, or openSUSE Leap 15. 16:22:14 <smcginnis> e0ne: ;) 16:22:51 <rajinir> There are two flags in devstack USE_PYTHON3 and PYTHON3_VERSION 16:23:15 <smcginnis> Oh, great. 16:23:32 <smcginnis> https://docs.openstack.org/devstack/latest/configuration.html#id19 16:24:19 <_erlon_> rajinir: great! Good to know thanks 16:24:33 <_erlon_> I'll give a shot on that 16:24:44 <jungleboyj> Cool. 16:25:13 <jungleboyj> Anything more on Py3? 16:26:06 <jungleboyj> I take that as a no. 16:26:21 <jungleboyj> Thank you all vendors for your input and for participating here. 16:26:32 <jungleboyj> Appreciate all of you that stay active. 16:27:52 <jungleboyj> Ok. So ... 16:27:59 <jungleboyj> #topic Open Discussion 16:28:13 <jungleboyj> Anything else that people want to discuss? 16:28:48 <_erlon_> I do 16:28:53 <jungleboyj> _erlon_: Sure. 16:29:16 <_erlon_> I would like to get some input on the replication interaction beteewn Nova and Cinder 16:29:33 <_erlon_> may be there's someone here with some backgrond and experience on that 16:29:59 <_erlon_> Im trying to understand if is possible to do a dynamic re-discover of volumes after a failover/failback 16:30:02 <jungleboyj> What kind of information are you looking for? 16:30:13 <jungleboyj> Hmmm ... 16:31:00 <jungleboyj> geguileo: You know that code path to some extent ... 16:31:18 <_erlon_> I know, that currently its said that you need to detach and re-attach the volumes, but this seems to be a very manual process and I *think* its not possible for volume backed VMs 16:31:26 <geguileo> there is no interaction afaik 16:31:55 <geguileo> _erlon_: if you mean for boot volume VMs, yes, you are correct 16:32:25 <jungleboyj> Right. Limited functionality there. 16:32:26 <geguileo> I think I wrote somewhere current limitations... 16:32:37 <_erlon_> geguileo: because its not implemented or because is not possible? 16:32:38 <_erlon_> I was trying to imagine how it would be possible, but I think it might not 16:34:17 <jungleboyj> For Volume Backed VMs I think it is highly unlikely that it could work. 16:34:44 <_erlon_> jungleboyj: yep, so, one would have to re-create those kind of VMs 16:34:55 <jungleboyj> _erlon_: Correct. 16:35:01 <smcginnis> It's all very manual right now, kind of on purpose. 16:35:07 <geguileo> _erlon_: I believe this falls in the same category of changing the root volume contents 16:35:09 <jungleboyj> smcginnis: ++ 16:35:18 <smcginnis> Past attempts had tried to bite off too much, so we wanted a minimally viable feature. 16:35:39 <geguileo> _erlon_: so it could be implemented in the same way that we are working now to reimage the root volume 16:35:42 <smcginnis> I know folks want something like VMware SRM, but that's not at all what we provide and it would take a LOT of work to get there. 16:35:47 <geguileo> _erlon_: but in this case it would need to reattach it 16:36:13 <jungleboyj> :-( 16:36:44 <_erlon_> smcginnis: it can contiue to be manual, but only triggered manualy, not the whole process, that sometims might require manual intervention to DB 16:37:18 <_erlon_> how its peoples experience with what we have so far 16:37:20 <_erlon_> ? 16:37:34 <_erlon_> I hope nobudy ever needed to use it lol 16:37:37 <jungleboyj> _erlon_: That is a good question. Not sure how many people are actually using it. 16:38:42 <jungleboyj> After all the work to get it there ... if no one is using it. *Sigh* 16:38:46 <_erlon_> jungleboyj: we found out this lat days that our SF driver has some flaws that would make hardly possible to someone to use it, so, we believe nobody has ever really even tried 16:39:26 <jungleboyj> Probably true. 16:39:45 <geguileo> _erlon_: replication at this point is mainly to ensure you don't lose data 16:40:00 <geguileo> but to recover, that would require a lot of knowledge and manual steps 16:40:04 <geguileo> afaik 16:40:08 <_erlon_> jungleboyj: Im putting 2 meaning to the 'using' word, I would expect people to use, in the sense of having it ready for a disaster, but not really having to do a failover 16:40:31 <jungleboyj> Ah, true enough. 16:41:18 <geguileo> it was discussed at some point to have a feature to "try the failover/failback" 16:41:39 <jungleboyj> Yes. Wasn't some of that implemented? 16:41:49 <_erlon_> geguileo: yes, that was the first point in replication v1. To be able to continually test the process 16:41:51 <geguileo> to ensure that it was actually working 16:42:24 <_erlon_> but with the backend based replication on v2 this was left behind 16:43:03 <jungleboyj> Ah, that is right. 16:44:06 <_erlon_> it is possible to do it, you would need to setup a backend using the same arrays and them keep failing over it back and forth periodically 16:44:33 <jungleboyj> So, I think what we are getting at here is that if this hasn't been tested by vendors, it probably hasn't been tested. 16:44:42 <_erlon_> :) 16:45:20 <jungleboyj> That is not good. 16:45:37 <geguileo> jungleboyj: you mean that the failover and failback hasn't been tested? 16:45:48 <jungleboyj> geguileo: Not sure. 16:45:52 <geguileo> or that the whole process of failing over + making the VMs work? 16:45:53 <jungleboyj> Who has tested it? 16:45:57 <geguileo> I tested it 16:46:03 <jungleboyj> Ok, that is good. 16:46:20 <geguileo> when I was helping jobernar on the RBD failover 16:46:35 <jungleboyj> Did you do anything with VMs? 16:46:36 <geguileo> and I discovered a bunch of issues in our core code and fixed most of them 16:46:41 <_erlon_> geguileo: we mean that there's few testing on Cinder and there not a good way for users that have it in production to test the failover 16:47:03 <geguileo> jungleboyj: this was a long time ago, when Nova couldn't even force-detach 16:47:17 <_erlon_> jungleboyj: we are probably putting together some replication tests this release we could try to run them on the gate with ceph 16:47:22 <geguileo> so it was not possible to cleanly re-attach a volume, the old one would still be there 16:47:55 <geguileo> _erlon_: agreed, there is not enough testing 16:48:13 <jungleboyj> Ok. That isn't surprising. 16:48:16 <geguileo> _erlon_: and recovering VMs is not easy (it is HARD) 16:48:22 <jungleboyj> That is why there needs to be a detach and reattach? 16:48:39 <_erlon_> geguileo: if you re-attach the device will came back in another path in the VMs right? 16:48:56 <geguileo> jungleboyj: in a perfect world we should just tell Nova which volumes have been failed over 16:49:04 <_erlon_> yeap, I assume so 16:49:05 <geguileo> and nova would do whatever needs to do 16:49:15 <geguileo> _erlon_: yup 16:49:17 <jungleboyj> Yeah, that would be nice. 16:50:11 <_erlon_> geguileo: couldn't Cinder let nova know it? so, that would be a more straightforward process for the admin? 16:50:12 <geguileo> _erlon_: but maybe Nova can hack around this 16:50:25 <geguileo> _erlon_: yes, Cinder could let Nova know 16:50:33 <geguileo> which volumes and the instances they are connected to 16:50:57 <geguileo> iirc the failover method returned volumes that have been failed over 16:51:10 <_erlon_> geguileo: Nova can, but not the OS, which will be trying to write to the dead device 16:51:20 <geguileo> so then Cinder would just need to check the instances they are attached to and tell Nova about those 16:51:22 <_erlon_> geguileo: yes they do 16:51:54 <geguileo> (ok, my memory is not that bad) 16:52:28 <geguileo> _erlon_: we would have to talk with Nova and see if it would be helpful for them to know the change 16:52:40 <geguileo> (aka can they do something about it?) 16:53:12 <_erlon_> geguileo: mhm, Ill try to bring that to next Nova meeting 16:53:22 <geguileo> _erlon_: thanks 16:53:25 <_erlon_> not sure if Ill make it tomorrow, but asap 16:54:00 <jungleboyj> _erlon_: Cool. Thank you for following up on that. 16:54:07 <_erlon_> jungleboyj: sure, np 16:54:37 <jungleboyj> #action Erlon to follow up with Nova team to find out if there is anything they can do to handle failures better if they know the volumes failed over and the associated instances. 16:55:02 <jungleboyj> We have 6 minutes left. Anything else that people would like to discuss. 16:55:11 <jungleboyj> geguileo: Thanks for the expertise there by the way. 16:56:16 <jungleboyj> Ok. Looks like everyone has gone quiet so I will wrap up the meeting. 16:56:37 <jungleboyj> Thank you all for attending. Good discussions today. 16:56:40 <smcginnis> Thanks jungleboyj 16:56:43 <whoami-rajat> Thanks! 16:56:47 <enriquetaso> o/ 16:56:48 <jungleboyj> #endmeeting