16:00:28 <eglute> #startmeeting defcore 16:00:29 <openstack> Meeting started Wed Nov 18 16:00:28 2015 UTC and is due to finish in 60 minutes. The chair is eglute. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:33 <openstack> The meeting name has been set to 'defcore' 16:00:57 <eglute> Hello everyone, please let us know if you are here for the defcore meeting 16:01:04 <dwalleck> o/ 16:01:04 <eglute> #link https://etherpad.openstack.org/p/DefCoreRing.3 16:01:09 <markvoelker> o/ 16:01:12 <dfisher> o/ 16:01:17 <mfisher_ora> o/ 16:01:17 <jlk> o/ 16:01:20 <hogepodge> o/ 16:01:22 <pbredenb> o/ 16:01:23 <albertw> o/ 16:01:31 <seanw1> o/ 16:01:51 <eglute> DefCore is popular this morning :D 16:01:59 <dfisher> (with Oracle) :) 16:02:01 <mfisher_ora> anthills 16:02:02 <mfisher_ora> kicked 16:02:06 <mfisher_ora> see comments last week :) 16:02:30 <VanL> o/ 16:02:35 <eglute> #topic must OpenStack run Linux OS to pass DefCore 16:02:35 <onga> o/ 16:02:49 <eglute> #link https://review.openstack.org/#/c/244782/ 16:03:00 <jlk> ^^ that's why it's poular 16:03:07 <jlk> popular even. 16:03:08 <eglute> please review the patch and all the comments if you have not yet 16:03:36 <eglute> The patch is asking to flag specific tests, however, it raised a broader issue 16:04:20 <eglute> Would someone from Oracle like to present their case? 16:05:01 <eglute> or rather, to summarize it? mfisher_ora? 16:05:10 <mfisher_ora> So basically, these tests use a utility function inside of Tempest in order to do instance validation 16:05:27 <mfisher_ora> the instance validation are fairly basic things like 'make sure hostname is set correctly' or 'verify the number of vcpus is correct' 16:05:51 <mfisher_ora> unfortunately, the utility function is hard coded to use linux-specific commands, and if the VM OS is not linux, the tests will always fail 16:06:21 <eglute> #chair hogepodge 16:06:22 <openstack> Current chairs: eglute hogepodge 16:07:03 <mfisher_ora> The intent of the flags was to allow non-linux VM tests to pass until there's another option available in Tempest to either support multiple OSes (unwieldy and unwelcome for a lot of reasons), or provide an abstraction mechanism to allow a DefCore / RefStack user to provide their own commands or library of commands to run instead 16:07:33 <mfisher_ora> In short, becaues the utilities are linux specific, no VM running a non-linux OS can ever pass a defcore standard that includes these tests 16:08:06 <mfisher_ora> This doesn't solely apply to Oracle's Solaris offerings, but that's how I discovered it since currently we're testing our Nova driver and using Solaris Zones 16:08:19 <markvoelker> Point of curiosity: tests aside, do those capabilities actually all work with the Zones driver? E.g. do instance hostnames actually get set, etc? 16:08:24 <dfisher> 100% 16:08:35 <markvoelker> Thanks. 16:09:05 <mfisher_ora> Yes, I currently have my own version of remote_client and I've changed the tempest tests to use it instead of the linux client, and everything works fine 16:09:21 <eglute> Thank you mfisher_ora. 16:10:53 <markvoelker> eglute: I think there are some folks here who had strong opinions on the patch (jlk et al), perhaps we should let them say their piece as well if they have anything to add to their gerrit comments? 16:11:28 <eglute> +1 16:11:40 <eglute> Anyone would like to add anything? 16:11:41 <jlk> Hi 16:12:34 <jlk> My main objection is that this patch is coming at the defcore repo. It really feels like putting the cart before the horse, and it's masking things we'd really like to see pass. This driver is out of tree, and according to Nova team, it may be along battle to get it into tree. 16:12:48 <jlk> If the goal is to make tests pass, I feel like the work should instead happen at the Tempest level. 16:13:03 <jlk> I think it's inappropriate to introduce the flags at the defcore level. 16:13:38 <jlk> Having the flags at defcore makes this a policy of "what does OpenStack mean" argument, rather than a technical "how do we do testing" argument. 16:14:08 <jlk> Flagging the tests doesn't actually make the driver any better or more complete, all it does is mask the places where the tests fail to account for the driver limitations 16:14:22 <dfisher> but defining "what OpenStack means" is a critical (if not THE critical) aspect of this patch. 16:14:34 <dfisher> it's just manifesting in a technical way 16:14:45 <jlk> that's fair. 16:14:52 <pbredenb> ..and to follow, isn't Defcore's charter to define "what OpenStack means"? 16:14:57 <eglute> any other objections that have not yet been listed? 16:14:58 <jlk> which means we don't need to focus on this specific PR 16:15:30 <dfisher> I'm in agreement with that. This PR exposed a much much larger issue that we in Solaris would like to see resolved. 16:15:32 <jlk> eglute: outside of what's in the gerrit review, I don't think so. 16:15:34 <hogepodge> jlk: one of the intentions of a flag is to mark the test as problematic and give space to not have to pass the test until it is fixed. resizing compute instances is flagged until a more secure method is implemented in nova, for example. We're trying to take improving upstream into account. 16:15:50 <mfisher_ora> A small rebuttal on flagging the test because of a technical reason, there's already precedent to do so as there is a test in defcore that is flagged because Tempest currently has skip exception applied to that test 16:16:18 <jlk> hogepodge: that's fair, if upstream is committed to fixing the issue. From Nova team feedback, it sounds like they wouldn't accept a driver that is incapable of running a Linux userland. 16:16:37 <eglute> I am in favor of having multiple OS VMs run on OpenStack, and right now we have defaulted to Linux. mfisher_ora are you working on supporting different OSs on your OpenStack deployments? 16:16:42 <jlk> If upstream isn't going to take such a driver, then flagging the tests seems inappropriate. 16:17:16 <dfisher> as a side note to jlk's previous comment: that's an absurd stance to take since Nova is designed to scale to as many compute nodes running whatever OS as an operator requires. 16:17:57 <dfisher> with properties in images and the scheduler, everything *just works* in a heterogenous environment with a Solaris compute node and Linux/Windows compute nodes. 16:18:05 <jlk> dfisher: they've clarified that at a minimum it should run a Linux userland. 16:18:18 <hogepodge> jlk: yes, it seems so. The compute driver is not a designated code section, so vendors are free to not use upstream. It's just part of the guideline, and I'm not trying to argue, just want to clarify what the standard requires (indeed, using it exposes these issues) 16:18:21 <dfisher> officially? documented? passed with all hands voting? 16:18:47 <dfisher> because that statement completely handcuffs us. 16:18:48 <jlk> dfisher: no, I don't think an official vote has happened, because the question hasn't necessarily come up. 16:19:05 <jlk> but this isn't the body to speak to Nova acceptance policy :) 16:19:11 <jlk> do we have any nova core? 16:20:08 <dfisher> we were under the impression that if our driver supported DefCore and the Hypervisor Matrix requirements, then we'd be allowed to discuss a PR for introduction of our driver. 16:20:38 <hogepodge> jlk: It definitely is. We want developer input. I would take as an action to consider changing the nova designated sections to specify in tree driver code. 16:20:53 <dfisher> that work is under way right now. Our driver passes a very large percentage of DefCore (mfisher_ora: have a %) and tempest in general. 16:21:17 <mfisher_ora> So I've been told by both openstack-qa (specifically Matt Treinish) and even in the meeting last week that you should be able to point Tempest (and by extension Defcore) at any cloud and run the tests. By defining that you must have a linux image, that is now going back on both of those statements 16:21:35 <markvoelker> dfisher: clarification: AFAIK meeting DefCore Guidelines isn't a requirement for getting your driver into Nova. There are other drivers that don't pass everything either. 16:21:58 <eglute> dfisher mfisher_ora are you working on supporting other OSs? 16:22:04 <dfisher> markvoelker: correct. we're trying to come at this from every angle. We want to cover Nova, docs, qa, tempest, etc. 16:22:15 <dfisher> eglute: it's unknown at this time 16:22:48 <dwalleck> I agree with the crux of the original argument that this is at some point a test issue. The remote_client was always meant to be an abstract base to be implemented for multiple OSes 16:23:32 <dfisher> but DefCore sets the *standard* 16:23:37 <dfisher> that's what this is about 16:23:47 <eglute> DefCore's purpose in life is interoperability 16:24:36 <eglute> Ideally, it will help users the most, so that they could switch between different openstack environments without having to worry that they will behave differently 16:24:46 <dfisher> and nova *has* that interoperability. run 1 Solaris compute node and 1+ Linux node. Everything works 16:25:01 <jlk> What we have in the review is statements from John Garbutt and Dan Smith from Nova, both seeming in agreement that a driver that can't run Linux userland wouldn't be acceptable in Nova. 16:25:06 <mfisher_ora> eglute: yes but the mechanism being used to do that is assuming something that it shouldn't 16:25:58 <mfisher_ora> and is therefore excluding offerings that can absolutely pass all of the defcore tests as long as a utility library is modified to change hard-coded commands 16:26:57 <dwalleck> mfisher_ora: and I'm all for not making those hard coded, either through multiple implementations or commands passed through a resource file 16:27:02 <jlk> but we're also talking about two different things here 16:27:14 <jlk> A single OpenStack cloud can handle multiple hypervisors for multiple purposes 16:27:49 <jlk> it's entirely possible to test an OpenStack cloud, and pass all the defcore tests, when it supplies a tiny fraction of capacity that can run Linux, while the majority is dedicated to something else 16:28:08 <dfisher> isn't that dishonest in the long run? 16:28:09 <jlk> I don't believe there is any verbiage that the /entire/ cloud has to be capable of passing the defcore standards 16:28:24 <mfisher_ora> side note dwalleck: I'd rather use the resource manager they've been talking about then asking openstack-qa to maintain multiple remote_client libraries in tree 16:28:25 <jlk> if that were the case, the Rackspace public cloud wouldn't pass 16:28:33 <markvoelker> mfisher_ora: So, let's back up a little. What's the product that you're trying to get into compliance with DefCore here? I ask b/c you mentioned running Linux+Solaris side-by-side and I'm wondering if that's part of the product? 16:28:41 <jlk> actually I take that back, it probably would, disregard that statement. 16:28:51 <jlk> dfisher: I don't know that it's dishonest. 16:29:12 <dfisher> saying that a cloud is 100% defcore compatible but only on the linux VMs? 16:29:13 <jlk> dfisher: defcore is about capability and interop. Your cloud deployment would be providing that 16:29:36 <jlk> the non-linux capacity would be added value of the cloud, beyond what defcore certifies 16:30:09 <dfisher> markvoelker: we're trying to get our Solaris offering in compliance with DefCore. 16:31:53 <markvoelker> dfisher: can you point me to a webpage or something? Need a little more background on that product as I'm not terribly familiar with it. 16:32:06 <dfisher> about the Solaris OpenStack offereing? 16:32:21 <markvoelker> yes 16:32:23 <eglute> So, right now Rackspace would not pass defcore tests if we were testing on Windows VMs. Linux, no problem. Windows only would be same test issue as with Solaris 16:32:47 <mfisher_ora> And any future OS that may want to join in openstack 16:32:54 <jlk> yeah, fortunately the RAX cloud offers capacity for Linux and windows 16:33:12 <breitz1> from a pure technical standpoint - it is not hard or time consuming to create an abstraction for these tests that would allow for multiple OSs to pass. Flagging these tests until that work can happen seems appropriate. This seems completely easy to solve. 16:33:12 <jlk> and more to the point, the Hypervisors providing Windows capability are also capable of providing Linux capability. 16:33:44 <hogepodge> Defcore does not specify what hypervisor should be run 16:33:57 <dfisher> markvoelker: http://www.oracle.com/technetwork/articles/servers-storage-admin/getting-started-openstack-os11-2-2195380.html http://www.oracle.com/technetwork/server-storage/solaris11/technologies/openstack-2135773.html 16:34:05 <markvoelker> dfisher: thanks. 16:34:05 <hogepodge> Defcore does not directly specify which operating systems must run. 16:34:18 <jlk> breitz1: it seems premature to flag them, until A) upstream commits to changing the tests to be abstracted, B) upstream nova agrees to take in a hypervisor driver incapable of running Linux, and C) defcore agrees that solely offering non-Linux is acceptable. 16:34:19 <dfisher> trying to find a better "one pager" for you, but that second link has links to a bunch of things 16:34:21 <onga> markvoelker: http://www.oracle.com/technetwork/server-storage/solaris11/documentation/solaris11-2-openstack-faqs-2194278.pdf may also be of interest 16:34:29 <hogepodge> Defcore does indirectly require linuxy commands to test for hypervisor and network capabilities. 16:34:59 <dfisher> that link might be out of date since it references Solaris 11.2. 11.3 was released about a month ago. 16:35:14 <markvoelker> breitz1: The reason we're hesitant to flag things without considering carefully is that they can't be unflagged for at least six months, and in the interim users cannot depend on those capabilities from an interoperability standpoint. Hence the long discussion here. =) 16:35:28 <markvoelker> onga: thanks 16:35:41 <jlk> hogepodge: the question is, is that indirect requirement something that should be a design feature that people were just assuming, or is it an accident, or...? 16:35:44 <dfisher> also, and I don't know if this counts for anything, but our driver is completely in the open. 16:35:51 <dfisher> if you want a link to it, let me know. 16:35:56 <onga> dfisher: It's pretty high level - it's probably appropriate to get up to speed quickly on all the core components, and it is on Juno release, not Havana 16:36:15 <hogepodge> jlk: I think it's a bad test because it's checking for things it's not testing for 16:36:21 <dfisher> onga: right. 16:36:39 <hogepodge> jlk: I think that if we want linuxy behavior, that needs to be a testable capability (check-boot-cirros, for example). 16:37:04 <hogepodge> jlk: so it's not a side effect that can change or disappear 16:37:44 <mfisher_ora> real quick, where is the rule on the unflagging? I hadn't seen that one before 16:37:52 <catherineD> The test is not bad... it just need the correct remote-client to be set ... 16:38:12 <jlk> hogepodge: I would agree to that. I would not be surprised at all to find other assumptions on test coverage that are purely accidents 16:38:16 <hogepodge> bad is a strong word 16:38:28 <mfisher_ora> which is why I didn't flag them to remove them, just suppress them until Tempest is updated to support abstracted remote client 16:38:52 <mfisher_ora> similar to the reboot server soft test that is listed in DefCore but is currently bugged in Tempest 16:38:57 <hogepodge> jlk: absolutely. We don't have a good understanding of the resource requirements needed to pass defcore testing (I'm trying to work that out this cycle). 16:39:13 <catherineD> mfisher_ora: agreed that all remote clients should be officially supported by Tempest 16:39:23 <hogepodge> (so by bad I mean 'could be better') 16:39:38 <catherineD> hogepodge: agreed ... 16:39:40 <mfisher_ora> catherineD: actually I'm not necessarily saying or agreeing with that 16:39:54 <mfisher_ora> I just think there should be a mechanism to allow a user to supply the path or the import of their own remote client if needed 16:40:04 <markvoelker> mfisher_ora: I'll dig it up in a minute for you. Basically once a flag is in, it's in for the duration of that guideline 16:40:36 <mfisher_ora> so reboot_server_soft has this for the flag action: "action": "Remove flag after Tempest bug is fixed (in progress).", 16:40:46 <mfisher_ora> that would not apply immediately to all guidelines once the bug is fixed? 16:41:04 <catherineD> The reason is the any user can also initiate DefCore tests not just the vendor ... if the remote client is not in Tempest ... not all users can have access to it 16:42:37 <markvoelker> mfisher_ora: The flag would be removed in the next guideline. Which might be up to six months away. 16:43:02 <mfisher_ora> CatherineD: which is sort of why I'd like to see the remote commands moved to the resource manager, with linux defaults provided but overridable by a user if they know they're hitting a non-linux OS VM 16:43:22 <mfisher_ora> I'm just forseeing heavy pushback from openstack-qa if they were asked to support multiple remote_client libraries in tree 16:44:10 <dwalleck> mfisher_ora: But could the multiple remote instance clients exist out of tree as plugins? 16:45:52 <mfisher_ora> Possibly yes, I'm afraid my knowledge on the plugins is somewhat bare bones 16:46:12 <hogepodge> I have to step out of the meeting. 16:46:33 <markvoelker> #link http://git.openstack.org/cgit/openstack/defcore/tree/HACKING.rst#n81 16:46:37 <markvoelker> mfisher_ora: ^^ 16:47:42 <eglute> So, if step back and talk about user experience and interoperability, would a user that wishes to run cross-cloud applications have the same experience on solaris openstack as on any other openstack? 16:48:08 <mfisher_ora> markvoelker: thanks...although with more reading, I'm wondering if there's a technical issue with the test being suggested if it should be something lighter weight than a flag so that if the underlying bug is fixed, the 'not-flag' could be removed on all older guidelines immediately 16:48:11 <dfisher> eglute: yes 16:48:18 <markvoelker> eglute: From an API perspective, probably so from the sounds of it. 16:49:02 <markvoelker> eglute: one of the arguments has been that users would be surprised not to be able to run Linux though. I'm frankly not convinced of that yet. 16:49:24 <eglute> So if we are testing APIs, and they all work the same, what would the arguments be for requiring a linux image to be able to run? 16:49:54 <jlk> That begs the question 16:49:59 <jlk> is Interop solely about APIs? 16:50:15 <eglute> APIs and dedicated code sections 16:50:18 <jlk> If that is the line in the sand, then I don't think we have any argument 16:50:32 <jlk> fI think Monty would object to that being the line in the sand for interop 16:50:44 <jlk> but that could be addressed in a different proposal? 16:51:04 <markvoelker> jlk: No, but frankly it's where we're at now at this particular point in time. There has been a lot of talk about other things (like image format portability), but we just aren't there yet. 16:51:14 <eglute> #link https://github.com/openstack/defcore/blob/master/doc/source/process/CoreDefinition.rst 16:51:16 <rockyg> I would think that, just ad infra (and some tests) run bash scripts, that may or may not include system calls, that scripting would break on a move from linux to Solaris 16:52:06 <rockyg> Or Windows (unless windows has a linux compatibility mode) 16:52:08 <breitz1> or windows, BSD, or some new fancy cool OS that everybody jumps to next month 16:52:15 <markvoelker> jlk: we're about a year into DefCore being enforcing, so only so much percentage of the ocean we've boiled yet. =) 16:52:26 <jlk> agreed 16:52:49 * markvoelker looks at clock 16:53:04 <markvoelker> hogepodge: I see there's a note in the pad about the Board being interested in this? 16:53:09 <eglute> 7 minutes to boil the ocean 16:53:16 <markvoelker> hogepodge: Are you planning to bring this up with them, or....? 16:53:24 <eglute> hogepodge has stepped away 16:53:26 <rockyg> But, that begs the question, how many users wouldn't be aware of the OS flavor of the cloud they are running or planning to run on? 16:53:27 <mfisher_ora> I can throw another meteor at it if you need eglute 16:53:49 <eglute> but, hogepodge was telling me that before the committee makes the final decision, we take it to the board for approval 16:54:06 <rockyg> eglute, yup. 16:54:14 <markvoelker> hogepodge: I suspect the TC might be interested in the general discussion here as well if they aren't already, happy to bring it up with them for awareness. 16:54:41 <eglute> however, the committee will need to present options to the board as well as suggested actions 16:54:57 * markvoelker would like to thank the folks from Oracle, the nova-core folks, and all the others who have weighed in on this so far 16:55:38 <rockyg> So, the whole purpose in creating this whole "cloud OS" in python is to remove OS differences. 16:55:42 <eglute> I agree with markvoelker, this is a great discussion and raises awareness just how big the OpenStack ocean is and that DefCore's work is not done 16:56:36 <markvoelker> One last question for the Oracle folks: let's say QA agrees on adding a way to get those tests passing on non-Linux OS's (whether via a new agent, a plugin, or something else)... 16:57:06 <markvoelker> I presume you'd be willing to step up and help with that work since it sounds like you've done some of it already out of tree. Would you have a ballpark on how long it would take to get that support added in? 16:57:35 <mfisher_ora> absolutely, I'm currently wrapping up some other work but I intend to jump on the resource manager work for Tempest since it seems to help address this problem amongst other things 16:57:43 <markvoelker> I hear "it's pretty easy" so I'm trying to think what that mans in real terms, is all. 16:58:14 * jlk has to step out 16:58:16 <dwalleck> Part of the QA work for Mitaka is a resource manager, which would be part of this solution 16:58:21 <rockyg> And, either get OS agnostic test writing rules accepted by QA, or continue the conversion as new tests are added... 16:58:25 <mfisher_ora> As for how long that would take, I really don't have an estimation but definitely into the new year since the resources changes will take some time to implement and get through the integration process 16:58:40 <markvoelker> mfisher_ora: ok, fair enough. 16:58:50 <mfisher_ora> rockyg: They're already pretty good at being agnostic, most of the stuff being done just hits the api 16:59:34 <rockyg> mfisher_ora, so, before end of Mitaka? 16:59:48 <mfisher_ora> I'd have to look again but I want to say that resource manage is M2? 16:59:52 <mfisher_ora> manager* 17:00:20 <markvoelker> One last, last question: are these tests the only thing holding you back? E.g. is everything else passing at this point? 17:00:37 <eglute> and which guideline are you testing against? 17:00:50 <mfisher_ora> They're the only tests that are skips that aren't flags. We have a few other tests waiting for an internal bug fix to be generally available 17:01:03 <markvoelker> mfisher_ora: thanks 17:01:17 * markvoelker looks at the clock again 17:01:20 <eglute> Thanks Everyone for the discussion today, we are out of time for the meeting, but this channel is always open :). 17:01:21 <mfisher_ora> And it looks like the resource manager is targetting M3 17:01:34 <catherineD> Skip tests may caused by tempest.conf 17:01:44 <eglute> #endmeeting