15:00:21 #startmeeting manila 15:00:22 Meeting started Thu Sep 14 15:00:21 2017 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:25 The meeting name has been set to 'manila' 15:00:29 o/ 15:00:35 hello all 15:00:38 Hi 15:00:39 hello 15:00:41 hi 15:01:29 I'll give people a moment to gather because of possible travel 15:01:34 hi 15:01:50 some of us are here at PTG in Denver, crashing the cinder sessions 15:02:01 #curling 15:03:00 Hi 15:03:04 also I'm on sketchy wifi so if I drop please be patient 15:03:25 it's been good for the most part but I have seen dropouts this week 15:03:44 #topic announcemnts 15:03:58 so first of all, next week is Manila PTG 15:04:28 hey 15:04:52 the etherpad link is in the channel topic, and we plan to meet for 2 days, possibly all day, but we'll try to wrap things up early each day if we can for the benefit of people in far away time zones 15:05:13 I plan to use webex for audio because that's worked well in the past 15:05:36 there is sad news from xyang1 15:05:56 Hi, I was just laid off by dell emc 15:06:08 :-( 15:06:08 xyang1: omg! :( 15:06:09 xyang1: :( 15:06:16 oh no 15:06:17 My group no longer need open source contributor:( 15:06:43 xyang1: sorry to hear that 15:06:50 If anyone knows there is an opportunity, please let me know 15:07:15 My personal email: xingyang105@gmail.com 15:08:17 it sounds like xyang wants to continue her role in openstack so any companies that want an experienced contributor on staff have great opportunity to snap her up 15:08:33 we have a packed agenda today though so I'm going to try to move quickly 15:08:38 \o 15:08:44 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:08:51 o/ 15:08:57 #topic Add total count information in our list APIs 15:09:05 zhongjun: you're up 15:09:08 bswartz: thanks 15:09:12 #link https://review.openstack.org/#/c/501934/ 15:09:12 This feature can be used for showing how many resources a user or tenant has in the web portal's summary section. 15:09:12 It already discussed in cinder meeting 15:09:24 In addition to openstack, I have also started contributing in kubernetes. So I am interested in opportunities there as well. Thanks! 15:09:33 The cinder proposal will match with the API WG guidelines 15:09:33 Will we also follow up the API WG proposal? 15:09:45 link? 15:09:55 #link https://github.com/openstack/api-wg/blob/64e3e9b07272f50353429dc51d98524642ab6d67/guidelines/counting.rst#L12 15:10:21 I think this is a good idea 15:10:50 #link https://review.openstack.org/#/c/500665/ 15:10:58 it amounts to a performance optimization, and though it's not great REST, if there's a standard within the community that's good enough for me 15:11:26 xyang1: :( 15:12:01 zhongjun: thanks for bringing this in. I like it too, tommylikehu and you should propose this to api-sig 15:12:31 gouthamr: propose what? 15:12:47 gouthamr it sounds like the proposal comes FROM the API WG 15:13:00 I wasn't aware of it, but it seems reasonable to me 15:13:00 ? not that i'm aware of.. 15:13:07 ohh 15:13:07 xyang1: :) 15:13:08 bswartz: okay, so we just continue to do this work 15:13:09 yes 15:13:48 okay I also don't expect this to be a ton of work to code, or to review 15:13:58 we will want functional test coverage of course 15:13:59 +1 15:14:12 bswartz: sure 15:14:18 and I recommend writing a manila spec which largely just refers to the API WG document 15:14:31 +1 15:14:48 the manila spec should enumerate all of the object types for which counts will be added 15:15:28 okay nex topic 15:15:35 bswartz: it is simple spec, just added some manila api description 15:15:42 #topic Register and Document Policy in Code 15:15:54 this is zhongjun again 15:16:03 bswartz: Our current policy system is a little chaos 15:16:03 Do we need to rewrite those policies: 15:16:22 #link: https://etherpad.openstack.org/p/manila-ptg-queens 15:16:27 so I haven't been talking to people about this TC goal here in Denver 15:16:35 errr 15:16:46 I _have_ been talking to people about this TC goal here in Denver 15:16:59 bswartz: Or we could just implement the policy in code and save the original policies in the first step. 15:17:16 this is something we clearly want to do 15:17:17 bswartz: ? 15:17:44 bswartz: oh, you talked with people about this 15:17:47 the general agreement is to migrate the existing policies into code first 15:17:49 separating those concerns seems like a good idea: policy in code, revamp policy contnet 15:18:09 save any changes to the policies themselves for later changes 15:18:25 bswartz: okay, we should just following those steps 15:19:06 we don't need to go into great detail here because the TC goal doc is available, and zhongjun has volunteered to do the work 15:19:28 we cold revisit this topic at PTG if anyone has issues 15:19:50 for this one I'm not sure we need a spec 15:20:01 would a manila-specific spec any any value here? 15:20:12 s/any/add/ 15:21:01 redo policies = new spec <--- but this can wait as mentioned earlier 15:21:17 I can't think of a reason to, so let's just do this work as part of the TC goal program 15:21:23 next up 15:21:32 #topic Dynamic Log Level 15:21:48 #link https://review.openstack.org/#/c/445885/ 15:21:51 bswartz: Do we need to add REST API to control services' log levels dynamically 15:21:52 ^^ cinder 15:22:06 tbarron: thanks 15:22:16 ugh 15:22:23 ? 15:22:24 I don't like that idea 15:22:36 it seems quite useful, why not? 15:22:40 why can't we just reread the conf file for these options? 15:23:00 bswartz: reread and restart? 15:23:07 bswartz: I believe the benefit is not having to restart the services 15:23:18 it's a bad idea to have things in the conf file overrided by API calls, because after restart they will revert back 15:23:19 bswartz: how to reread 15:23:46 maybe PTG topic then 15:23:50 I've heard about the oslo guys implementing something that allows reloading CONF files 15:23:51 we need the feature cinder has to update the conf without restarting 15:23:54 it does not matter because we can get the latest values. 15:24:16 long conversation here 15:24:30 okay should we postpone this one to PTG? 15:24:33 bswartz: Oslo.config has supported this for a few cycles now, but it still have some problems 15:25:01 bswartz: It could be hard to manage log level when we have multiple nodes and multiple services. 15:25:18 my stance is that this can and should be achieved by rereading the conf file without restarting, but I'm open to hearing reasons why that's a bad idea 15:25:51 zhongjun: no harder than keeping the rest of the conf files consistent across a deployment 15:25:57 bswartz: we'll no longer be sure of what log level a service is running at a given time. 15:26:13 zhongjun: why? 15:26:15 I assume that's a solved problem under any decent deployment tool 15:27:10 I'm familiar with how puppet keeps config files synched -- I presume other technologies work at least as well 15:27:17 PTG topic please 15:27:20 full agenda 15:27:22 okay moving on.... 15:27:33 ganso: there could not have a way to get the log level from current service 15:27:40 #topic Install guide testing 15:27:45 tbarron: ok 15:27:55 So this came up in cinder yesterday 15:28:18 since the docs are now in our repo, there is an expectation that we QA the install guides 15:28:49 the install guides are a pretty bare bones way to install openstack -- without any deployment tools 15:29:17 tesing them is as easy as reading the doc and following the instructions and seeing if you end up with a working manila installation 15:29:28 the only downside is that it's a manual process 15:29:45 and there are slightly different instructions per-linux-flavor 15:30:23 because the package names and the directory paths vary slighty between centos/ubuntu/opensuse 15:30:51 if we find any bugs we should file them in LP and fix them like any code bugs 15:31:40 and we may want to consider cleaning up those docs and reducing duplication if it exists (haven't checked yet) by using sphinx includes 15:32:28 also bugs in the install guide should be fixed and backported to pike, so the sooner we can get install guide testing done the better 15:32:35 bswartz: Is there a link? 15:33:08 I'm looking for volunteers to test each platform 15:33:10 #link https://docs.openstack.org/manila/pike/install/ 15:34:17 okay I just wanted to mention that 15:34:26 we might be able to do some of that work during PTG 15:34:26 there are a bunch of new bugs showing up 15:34:36 or before PTG if people can find time 15:34:45 I'll move on 15:34:50 let's settle at PTG but I can probably look at the CentOS part 15:34:58 #topic Automatic generation of docs for configuration options 15:35:11 this is another docs topic that came up here in denver 15:35:24 I can probably look at the ubuntu part :) 15:35:57 there are now tools to autogenerate the tables of config opts from the latest live source code for documentation purposes 15:36:33 so a change to a manila config option could be automatically reflected in the documentation when it merges 15:37:17 the Cinder guys will be updating their docs to use this mechanism, and I suggest we follow suit 15:37:40 +1 15:37:55 from what I heard yesterday we might need to add some small code hooks for the generator so it knows what options are relevant to each driver (for example) 15:38:16 but it sounds like a very cool sphinx plugin 15:38:23 on to my last topci 15:38:30 #topic Timeouts 15:38:58 I haven't made much progress with the infra folks here 15:39:05 #link https://review.openstack.org/#/c/493092/ 15:39:46 they seem allergic to the idea of increasing our job timeouts 15:40:06 should we all just add more timed out job failures here? 15:40:14 on the plus side they do seem committed to trying to fix the underlying problem 15:40:53 well there was a time that they claimed that the underlying problem was fixed, so I abandoned my change 15:41:03 then the timeouts came back, so I restored it 15:41:15 it's possible they'll fix another underlying problem 15:41:26 but we should keep collecting evidence 15:41:36 to recognize these look for "timeout -s 9" in the console log 15:41:40 nonvoting jobs too 15:41:46 links to job logs of timed out jobs serve 2 purposes: 15:41:51 e.g. http://logs.openstack.org/66/502666/8/check/gate-manila-tempest-dsvm-mysql-generic-ubuntu-xenial-nv/35914d5/console.html#_2017-09-13_14_08_22_943668 15:41:59 bswartz tbarron: We were always bordering that timeout we had in project-config 15:42:02 1) they help infra narrow down where the real problem lies 15:42:25 bswartz tbarron: in the past we worked around by splitting tests into multiple classes 15:42:43 2) having a large amount of evidence supports our case that manila is disproportionately affected by this issue 15:42:57 i don't think infra's slow-nodes issue has anything to do with our request to increase timeout in general 15:43:04 so we can get more time to run the tests 15:43:16 however, slow nodes take 40+ minutes to Devstack smh 15:43:29 well I can modify my patch to request a more modest increase with a different justification 15:44:19 s/has anything/should have anything 15:44:23 in that case, we should find some examples of jobs passing with very little margin before the timeout, that are NOT on the so-called "very slow nodes" 15:44:37 gouthamr: isn't there an overall time budget and if it takes too long to build then the tests have even less time to run? 15:44:48 tbarron: exactly 15:44:57 tbarron: yes.. that's the one bswartz is toggling in his patch 15:45:04 we have less margin for error than other project jobs 15:45:23 tbarron: imo the overall budget was low... and limiting to our enthusiasm to add a ton of tests 15:45:36 gouthamr: I see, you are just saying that we don't need to bump the actual *test* timeout vs the overall budget time 15:45:51 is there a second timeout? 15:46:01 tbarron: yep... there're three timeouts 15:46:15 I'm talking about the job timeout before zuul simply aborts 15:46:36 bswartz: an overall zuul job timeout (this one is funny, 10 minutes are reserved for post operations inside this timeout) 15:46:44 a test suite timeout 15:46:55 can we change the other timeouts on our own? 15:46:56 and a per test resource wait timeout 15:47:19 oh yeah we can change the resouce wait timeouts, but those only tend to matter during failures 15:47:39 bswartz: yes... the third one is a tempest timeout option, the second one: there's a DEVSTACK_GATE opt for that 15:47:49 resource failures should be kept at default, as increasing them will largely impact the duration of tests 15:48:25 ganso: we could decrease them and expect overall less timeouts, albeit more failures 15:48:28 s/resource failures/resource wait timeouts 15:48:33 bswartz: remember that these are nested timeouts 15:48:55 bswartz: so your patch needs to be accepted imo 15:49:03 we may need to revisit this topic next week 15:49:23 my request is just to keep a list of links to jobs that demonstrate the problem 15:49:40 anyone have elastic search fu to catch all these automatically? 15:49:49 gouthamr: Do we have many nested timeouts? 15:49:54 my log-stash-fu is not good enough to simply query a list of all of the failures we're interested in 15:49:55 yes 15:49:59 bswartz: usually if something fails due to resource wait timeouts, then something is very wrong with the deployment and more stuff is likely to fail 15:50:09 bswartz: s/stuff/tests 15:50:20 i have a base link, but can't toggle the logstash options via API 15:50:47 we want a query that shows overall timeout plus test time is within the test timeout 15:50:56 ganso: yes I'm less concerend about failure cases than the success cases 15:51:13 these are the cases we can't address w/o infra help 15:51:32 if it's broken you're going to have to push another patch anyways -- what I want to get away from is rechecks 15:51:49 yes 15:52:02 okay before we run out of time 15:52:06 #topic open discussion 15:52:10 anything else for today? 15:52:30 ocata driverfixes is proposed 15:52:43 link? 15:52:44 #link https://review.openstack.org/#/c/504032/ 15:53:21 ack, I will review that 15:53:55 zhongjun: see https://review.openstack.org/#/c/493092/ <--- this is to extend the Zuul job timeout, which is essential 15:53:55 tbarron: +1 thanks 15:53:57 tbarron: i'm talking to eharney about our unit test issues 15:54:13 alright thanks everyone 15:54:14 he thinks we can get away without changing requirements and zuul.. 15:54:30 gouthamr: thanks 15:54:31 gouthamr: thanks 15:54:32 he's trying to get unittest jobs running on driverfixes branches in cinder 15:54:44 and thanks to PTG wifi for not failing duing this meeting 15:54:52 yeah good wifi 15:54:55 indeed 15:55:06 I'll see you next wednesday morning for PTG 15:55:21 #endmeeting