#openstack-meeting-alt log

15:00:21 <bswartz> #startmeeting manila
15:00:22 <openstack> Meeting started Thu Sep 14 15:00:21 2017 UTC and is due to finish in 60 minutes.  The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:25 <openstack> The meeting name has been set to 'manila'
15:00:29 <gouthamr> o/
15:00:35 <bswartz> hello all
15:00:38 <cknight> Hi
15:00:39 <ganso> hello
15:00:41 <zhongjun> hi
15:01:29 <bswartz> I'll give people a moment to gather because of possible travel
15:01:34 <tbarron> hi
15:01:50 <bswartz> some of us are here at PTG in Denver, crashing the cinder sessions
15:02:01 <gouthamr> #curling
15:03:00 <xyang1> Hi
15:03:04 <bswartz> also I'm on sketchy wifi so if I drop please be patient
15:03:25 <bswartz> it's been good for the most part but I have seen dropouts this week
15:03:44 <bswartz> #topic announcemnts
15:03:58 <bswartz> so first of all, next week is Manila PTG
15:04:28 <toabctl> hey
15:04:52 <bswartz> the etherpad link is in the channel topic, and we plan to meet for 2 days, possibly all day, but we'll try to wrap things up early each day if we can for the benefit of people in far away time zones
15:05:13 <bswartz> I plan to use webex for audio because that's worked well in the past
15:05:36 <bswartz> there is sad news from xyang1
15:05:56 <xyang1> Hi, I was just laid off by dell emc
15:06:08 <bswartz> :-(
15:06:08 <tbarron> xyang1: omg!  :(
15:06:09 <ganso> xyang1: :(
15:06:16 <zhongjun> oh no
15:06:17 <xyang1> My group no longer need open source contributor:(
15:06:43 <gouthamr> xyang1: sorry to hear that
15:06:50 <xyang1> If anyone knows there is an opportunity, please let me know
15:07:15 <xyang1> My personal email: xingyang105@gmail.com
15:08:17 <bswartz> it sounds like xyang wants to continue her role in openstack so any companies that want an experienced contributor on staff have great opportunity to snap her up
15:08:33 <bswartz> we have a packed agenda today though so I'm going to try to move quickly
15:08:38 <dustins> \o
15:08:44 <bswartz> #agenda https://wiki.openstack.org/wiki/Manila/Meetings
15:08:51 <amito-infinidat> o/
15:08:57 <bswartz> #topic Add total count information in our list APIs
15:09:05 <bswartz> zhongjun: you're up
15:09:08 <zhongjun> bswartz: thanks
15:09:12 <zhongjun> #link https://review.openstack.org/#/c/501934/
15:09:12 <zhongjun> This feature can be used for showing how many resources a user or tenant has in the web portal's summary section.
15:09:12 <zhongjun> It already discussed in cinder meeting
15:09:24 <xyang1> In addition to openstack, I have also started contributing in kubernetes.  So I am interested in opportunities there as well.  Thanks!
15:09:33 <zhongjun> The cinder proposal will match with the API WG guidelines
15:09:33 <zhongjun> Will we also follow up the API WG proposal?
15:09:45 <bswartz> link?
15:09:55 <zhongjun> #link https://github.com/openstack/api-wg/blob/64e3e9b07272f50353429dc51d98524642ab6d67/guidelines/counting.rst#L12
15:10:21 <bswartz> I think this is a good idea
15:10:50 <tbarron> #link https://review.openstack.org/#/c/500665/
15:10:58 <bswartz> it amounts to a performance optimization, and though it's not great REST, if there's a standard within the community that's good enough for me
15:11:26 <markstur> xyang1: :(
15:12:01 <gouthamr> zhongjun: thanks for bringing this in. I like it too, tommylikehu and you should propose this to api-sig
15:12:31 <tommylikehu> gouthamr:  propose what?
15:12:47 <bswartz> gouthamr it sounds like the proposal comes FROM the API WG
15:13:00 <bswartz> I wasn't aware of it, but it seems reasonable to me
15:13:00 <gouthamr> ? not that i'm aware of..
15:13:07 <gouthamr> ohh
15:13:07 <tommylikehu> xyang1:  :)
15:13:08 <zhongjun> bswartz:  okay, so we just continue to do this work
15:13:09 <gouthamr> yes
15:13:48 <bswartz> okay I also don't expect this to be a ton of work to code, or to review
15:13:58 <bswartz> we will want functional test coverage of course
15:13:59 <tbarron> +1
15:14:12 <zhongjun> bswartz: sure
15:14:18 <bswartz> and I recommend writing a manila spec which largely just refers to the API WG document
15:14:31 <gouthamr> +1
15:14:48 <bswartz> the manila spec should enumerate all of the object types for which counts will be added
15:15:28 <bswartz> okay nex topic
15:15:35 <zhongjun> bswartz: it is simple spec, just added some manila api description
15:15:42 <bswartz> #topic Register and Document Policy in Code
15:15:54 <bswartz> this is zhongjun again
15:16:03 <zhongjun> bswartz: Our current policy system is a little chaos
15:16:03 <zhongjun> Do we need to rewrite those policies:
15:16:22 <zhongjun> #link: https://etherpad.openstack.org/p/manila-ptg-queens
15:16:27 <bswartz> so I haven't been talking to people about this TC goal here in Denver
15:16:35 <bswartz> errr
15:16:46 <bswartz> I _have_ been  talking to people about this TC goal here in Denver
15:16:59 <zhongjun> bswartz: Or we could just implement the policy in code and save the original policies in the first step.
15:17:16 <bswartz> this is something we clearly want to do
15:17:17 <zhongjun> bswartz: ?
15:17:44 <zhongjun> bswartz: oh, you talked with people about this
15:17:47 <bswartz> the general agreement is to migrate the existing policies into code first
15:17:49 <tbarron> separating those concerns seems like a good idea: policy in code, revamp policy contnet
15:18:09 <bswartz> save any changes to the policies themselves for later changes
15:18:25 <zhongjun> bswartz: okay, we should just following those steps
15:19:06 <bswartz> we don't need to go into great detail here because the TC goal doc is available, and zhongjun has volunteered to do the work
15:19:28 <bswartz> we cold revisit this topic at PTG if anyone has issues
15:19:50 <bswartz> for this one I'm not sure we need a spec
15:20:01 <bswartz> would a manila-specific spec any any value here?
15:20:12 <bswartz> s/any/add/
15:21:01 <gouthamr> redo policies = new spec <--- but this can wait as mentioned earlier
15:21:17 <bswartz> I can't think of a reason to, so let's just do this work as part of the TC goal program
15:21:23 <bswartz> next up
15:21:32 <bswartz> #topic Dynamic Log Level
15:21:48 <tbarron> #link https://review.openstack.org/#/c/445885/
15:21:51 <zhongjun> bswartz:  Do we need to add REST API to control services' log levels dynamically
15:21:52 <tbarron> ^^ cinder
15:22:06 <zhongjun> tbarron: thanks
15:22:16 <bswartz> ugh
15:22:23 <tbarron> ?
15:22:24 <bswartz> I don't like that idea
15:22:36 <tbarron> it seems quite useful, why not?
15:22:40 <bswartz> why can't we just reread the conf file for these options?
15:23:00 <tommylikehu> bswartz:  reread and restart?
15:23:07 <ganso> bswartz: I believe the benefit is not having to restart the services
15:23:18 <bswartz> it's a bad idea to have things in the conf file overrided by API calls, because after restart they will revert back
15:23:19 <zhongjun> bswartz: how to reread
15:23:46 <tbarron> maybe PTG topic then
15:23:50 <ganso> I've heard about the oslo guys implementing something that allows reloading CONF files
15:23:51 <bswartz> we need the feature cinder has to update the conf without restarting
15:23:54 <tommylikehu> it does not matter because we can get the latest values.
15:24:16 <tbarron> long conversation here
15:24:30 <bswartz> okay should we postpone this one to PTG?
15:24:33 <zhongjun> bswartz: Oslo.config has supported this for a few cycles now, but it still have some problems
15:25:01 <zhongjun> bswartz: It could be hard to manage log level when we have multiple nodes and multiple services.
15:25:18 <bswartz> my stance is that this can and should be achieved by rereading the conf file without restarting, but I'm open to hearing reasons why that's a bad idea
15:25:51 <bswartz> zhongjun: no harder than keeping the rest of the conf files consistent across a deployment
15:25:57 <zhongjun> bswartz:  we'll no longer be sure of what log level a service is running at a given time.
15:26:13 <ganso> zhongjun: why?
15:26:15 <bswartz> I assume that's a solved problem under any decent deployment tool
15:27:10 <bswartz> I'm familiar with how puppet keeps config files synched -- I presume other technologies work at least as well
15:27:17 <tbarron> PTG topic please
15:27:20 <tbarron> full agenda
15:27:22 <bswartz> okay moving on....
15:27:33 <zhongjun> ganso: there could not have a way to get the log level from current service
15:27:40 <bswartz> #topic Install guide testing
15:27:45 <zhongjun> tbarron: ok
15:27:55 <bswartz> So this came up in cinder yesterday
15:28:18 <bswartz> since the docs are now in our repo, there is an expectation that we QA the install guides
15:28:49 <bswartz> the install guides are a pretty bare bones way to install openstack -- without any deployment tools
15:29:17 <bswartz> tesing them is as easy as reading the doc and following the instructions and seeing if you end up with a working manila installation
15:29:28 <bswartz> the only downside is that it's a manual process
15:29:45 <bswartz> and there are slightly different instructions per-linux-flavor
15:30:23 <bswartz> because the package names and the directory paths vary slighty between centos/ubuntu/opensuse
15:30:51 <bswartz> if we find any bugs we should file them in LP and fix them like any code bugs
15:31:40 <bswartz> and we may want to consider cleaning up those docs and reducing duplication if it exists (haven't checked yet) by using sphinx includes
15:32:28 <bswartz> also bugs in the install guide should be fixed and backported to pike, so the sooner we can get install guide testing done the better
15:32:35 <zhongjun> bswartz: Is there a link?
15:33:08 <bswartz> I'm looking for volunteers to test each platform
15:33:10 <bswartz> #link https://docs.openstack.org/manila/pike/install/
15:34:17 <bswartz> okay I just wanted to mention that
15:34:26 <bswartz> we might be able to do some of that work during PTG
15:34:26 <gouthamr> there are a bunch of new bugs showing up
15:34:36 <bswartz> or before PTG if people can find time
15:34:45 <bswartz> I'll move on
15:34:50 <tbarron> let's settle at PTG but I can probably look at the CentOS part
15:34:58 <bswartz> #topic Automatic generation of docs for configuration options
15:35:11 <bswartz> this is another docs topic that came up here in denver
15:35:24 <zhongjun> I can probably look at the ubuntu part :)
15:35:57 <bswartz> there are now tools to autogenerate the tables of config opts from the latest live source code for documentation purposes
15:36:33 <bswartz> so a change to a manila config option could be automatically reflected in the documentation when it merges
15:37:17 <bswartz> the Cinder guys will be updating their docs to use this mechanism, and I suggest we follow suit
15:37:40 <tbarron> +1
15:37:55 <bswartz> from what I heard yesterday we might need to add some small code hooks for the generator so it knows what options are relevant to each driver (for example)
15:38:16 <bswartz> but it sounds like a very cool sphinx plugin
15:38:23 <bswartz> on to my last topci
15:38:30 <bswartz> #topic Timeouts
15:38:58 <bswartz> I haven't made much progress with the infra folks here
15:39:05 <bswartz> #link https://review.openstack.org/#/c/493092/
15:39:46 <bswartz> they seem allergic to the idea of increasing our job timeouts
15:40:06 <tbarron> should we all just add more timed out job failures here?
15:40:14 <bswartz> on the plus side they do seem committed to trying to fix the underlying problem
15:40:53 <bswartz> well there was a time that they claimed that the underlying problem was fixed, so I abandoned my change
15:41:03 <bswartz> then the timeouts came back, so I restored it
15:41:15 <bswartz> it's possible they'll fix another underlying problem
15:41:26 <bswartz> but we should keep collecting evidence
15:41:36 <tbarron> to recognize these look for "timeout -s 9" in the console log
15:41:40 <tbarron> nonvoting jobs too
15:41:46 <bswartz> links to job logs of timed out jobs serve 2 purposes:
15:41:51 <tbarron> e.g. http://logs.openstack.org/66/502666/8/check/gate-manila-tempest-dsvm-mysql-generic-ubuntu-xenial-nv/35914d5/console.html#_2017-09-13_14_08_22_943668
15:41:59 <gouthamr> bswartz tbarron: We  were always bordering that timeout we had in project-config
15:42:02 <bswartz> 1) they help infra narrow down where the real problem lies
15:42:25 <gouthamr> bswartz tbarron: in the past we worked around by splitting tests into multiple classes
15:42:43 <bswartz> 2) having a large amount of evidence supports our case that manila is disproportionately affected by this issue
15:42:57 <gouthamr> i don't think infra's slow-nodes issue has anything to do with our request to increase timeout in general
15:43:04 <gouthamr> so we can get more time to run the tests
15:43:16 <gouthamr> however, slow nodes take 40+ minutes to Devstack smh
15:43:29 <bswartz> well I can modify my patch to request a more modest increase with a different justification
15:44:19 <gouthamr> s/has anything/should have anything
15:44:23 <bswartz> in that case, we should find some examples of jobs passing with very little margin before the timeout, that are NOT on the so-called "very slow nodes"
15:44:37 <tbarron> gouthamr: isn't there an overall time budget and if it takes too long to build then the tests have even less time to run?
15:44:48 <bswartz> tbarron: exactly
15:44:57 <gouthamr> tbarron: yes.. that's the one bswartz is toggling in his patch
15:45:04 <bswartz> we have less margin for error than other project jobs
15:45:23 <gouthamr> tbarron: imo the overall budget was low... and limiting to our enthusiasm to add a ton of tests
15:45:36 <tbarron> gouthamr: I see, you are just saying that we don't need to bump the actual *test* timeout vs the overall budget time
15:45:51 <bswartz> is there a second timeout?
15:46:01 <gouthamr> tbarron: yep... there're three timeouts
15:46:15 <bswartz> I'm talking about the job timeout before zuul simply aborts
15:46:36 <gouthamr> bswartz: an overall zuul job timeout (this one is funny, 10 minutes are reserved for post operations inside this timeout)
15:46:44 <gouthamr> a test suite timeout
15:46:55 <bswartz> can we change the other timeouts on our own?
15:46:56 <gouthamr> and a per test resource wait timeout
15:47:19 <bswartz> oh yeah we can change the resouce wait timeouts, but those only tend to matter during failures
15:47:39 <gouthamr> bswartz: yes... the third one is a tempest timeout option, the second one: there's a DEVSTACK_GATE opt for that
15:47:49 <ganso> resource failures should be kept at default, as increasing them will largely impact the duration of tests
15:48:25 <bswartz> ganso: we could decrease them and expect overall less timeouts, albeit more failures
15:48:28 <ganso> s/resource failures/resource wait timeouts
15:48:33 <gouthamr> bswartz: remember that these are nested timeouts
15:48:55 <gouthamr> bswartz: so your patch needs to be accepted imo
15:49:03 <bswartz> we may need to revisit this topic next week
15:49:23 <bswartz> my request is just to keep a list of links to jobs that demonstrate the problem
15:49:40 <tbarron> anyone have elastic search fu to catch all these automatically?
15:49:49 <zhongjun> gouthamr: Do we have many nested timeouts?
15:49:54 <bswartz> my log-stash-fu is not good enough to simply query a list of all of the failures we're interested in
15:49:55 <gouthamr> yes
15:49:59 <ganso> bswartz: usually if something fails due to resource wait timeouts, then something is very wrong with the deployment and more stuff is likely to fail
15:50:09 <ganso> bswartz: s/stuff/tests
15:50:20 <gouthamr> i have a base link, but can't toggle the logstash options via API
15:50:47 <tbarron> we want a query that shows overall timeout plus test time is within the test timeout
15:50:56 <bswartz> ganso: yes I'm less concerend about failure cases than the success cases
15:51:13 <tbarron> these are the cases we can't address w/o infra help
15:51:32 <bswartz> if it's broken you're going to have to push another patch anyways -- what I want to get away from is rechecks
15:51:49 <bswartz> yes
15:52:02 <bswartz> okay before we run out of time
15:52:06 <bswartz> #topic open discussion
15:52:10 <bswartz> anything else for today?
15:52:30 <tbarron> ocata driverfixes is proposed
15:52:43 <bswartz> link?
15:52:44 <tbarron> #link https://review.openstack.org/#/c/504032/
15:53:21 <bswartz> ack, I will review that
15:53:55 <gouthamr> zhongjun: see https://review.openstack.org/#/c/493092/ <--- this is to extend the Zuul job timeout, which is essential
15:53:55 <gouthamr> tbarron: +1 thanks
15:53:57 <gouthamr> tbarron: i'm talking to eharney about our unit test issues
15:54:13 <bswartz> alright thanks everyone
15:54:14 <gouthamr> he thinks we can get away without changing requirements and zuul..
15:54:30 <zhongjun> gouthamr: thanks
15:54:31 <tbarron> gouthamr: thanks
15:54:32 <gouthamr> he's trying to get unittest jobs running on driverfixes branches in cinder
15:54:44 <bswartz> and thanks to PTG wifi for not failing duing this meeting
15:54:52 <gouthamr> yeah good wifi
15:54:55 <amito-infinidat> indeed
15:55:06 <bswartz> I'll see you next wednesday morning for PTG
15:55:21 <bswartz> #endmeeting