13:33:44 #startmeeting CI Scrum 13:33:45 Meeting started Wed Nov 2 13:33:44 2016 UTC and is due to finish in 60 minutes. The chair is adreznec. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:33:45 With good intentions, of course. 13:33:46 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:33:49 The meeting name has been set to 'ci_scrum' 13:33:55 whoa. 13:33:59 #topic Overview of current status 13:34:01 noyce. 13:34:13 Hmm 13:34:34 Anyway 13:34:51 I'll turn the floor over to thorst 13:34:53 OK - so the CI runs on Tempest VMs. They run remote to the NovaLink. So the fact that I made them with a file path is what was broken 13:35:20 unfortunately we didn't catch cause CI was down, and now we can't get the CI running until we get a fix in 13:35:31 here are my thoughts... 13:36:00 we could add a config option to go back to old way. I think for the image size its fine. We needed this new path for when people like kriskend and seroyer deploys twenty 20-gig images 13:36:11 but our CI is doing very small images, and most are linked clones (though not all) 13:36:28 So a secret config option or a public one? 13:36:39 (I'm down with the idea, btw. 13:36:40 ) 13:36:47 We're open source, hard for it to be a secret config option :P 13:36:47 efried: I think it should be public 13:37:00 thorst: We are moving to large images once we add in the OSA stuff 13:37:02 and I think that we need it public for our new in tree driver too. 13:37:19 esberglu: larger for the under cloud, but not the VMs that tempest deploys 13:37:26 So we're talking about reinstating the IterableToFileAdapter and all that 13:37:30 the under cloud would actually use the new path...because it is running on the novalink 13:37:40 No larger for the vms. OSA nodes need 50G free space 13:37:45 efried: well, I replaced it with a ChunkyFileIter in an old path... 13:38:07 in an old patch on the same change set? 13:38:10 esberglu: yeah, but not for the guest VMs deployed as part of the CI run 13:38:11 NM i'm dumb 13:38:11 esberglu: run...but that's all VMs that the under cloud provisions, not what the Tempest running in the VM provisions 13:38:25 Yeah I was confused by what you meant 13:38:26 run -> right 13:38:28 I get it now 13:38:52 Nope. In an abandoned change set? 13:39:02 * adreznec can't help but laugh every time he sees ChunkFileIter 13:39:06 *Chunky 13:39:13 efried: yeah, its actually a previous version of what merged 13:39:22 but what I'm not sure about is what to call this opt 13:39:28 "compat_upload_mode" or something 13:39:32 Anyway. So basically this configopt would provide a way to revert back to that not-quite-as-old behavior 13:40:08 right. 13:40:10 I guess it depends on whether we want this configopt to be specific for toggling to the old upload function 13:40:28 Or if we want it to be a development-only configopt for allowing the driver to work remotely in the future 13:40:37 hmmm 13:40:39 that's kinda neat. 13:40:44 Where we could have other things this configopt "fixes" 13:40:56 but whatever it is, it will need to persist into the future in-tree driver too 13:41:00 Something like "remote_driver_dev" 13:41:30 adreznec: slippery slope...next thing you know we'll have config opts for remote ips 13:41:33 and look like a remote driver. 13:41:37 Yeah, that's what I meant by a "private" option - basically either undocumented, or documented as "don't use this unless you're us" 13:41:57 thorst: yeah 13:42:00 Idk 13:42:04 efried adreznec: another idea... 13:42:13 is there a way we could ... hide this somehow? 13:42:24 put a pypowervm patch in that says 'no, you don't get to do that' 13:42:34 I was looking at it, and I couldn't see a way to do it easily. 13:42:53 we'd have to read in from the file path...and then pipe that into the REST layer 13:43:07 Or we put the IterableToFileAdapter (or the artist formerly known as) into pypowervm. 13:43:21 efried: well, really a FileToIterableAdapter 13:43:24 And then pypowervm detects remote and overrides the specified option, ignoring the function. 13:43:40 well, you need the function...cause that's the only interlock into glance 13:44:08 I kinda prefer that cause we need to patch the pypowervm in OSA already... 13:44:16 and in our devstack... 13:44:31 its trickier but I think it prevents slipperiness 13:44:38 oh, you want this as part of local2remote patch, not a permanent fixture in pypowervm? 13:44:44 I guess that makes sense. 13:44:45 efried: right. 13:44:52 Yeah 13:45:21 Basically this would be another library tweak for CI 13:45:24 those were only two ways I could think of it... Config opt or hide it in pypowervm local2remote. I prefer the second cause it will just work with everything and limit it to our CI 13:45:43 btw, does local2remote become moot if we decide to support remote pypowervm officially? 13:45:50 efried: nah 13:46:02 because its really a question of whether or not we support nova-powervm remotely 13:46:15 which we don't, except for CI (to allow scale) 13:46:42 Right 13:46:56 Okay, so presumably... 13:46:56 #action thorst to propose the local2remote patch to make this work 13:46:56 ? 13:47:01 And I can't see a compelling reason to want to outside development... kind of defeats the purpose of the driver 13:47:14 can we swap the owner to efried? 13:47:20 cause I want a different action later in meeting :-D 13:47:51 (I want to spend time updating the nova-powervm proposal rst) 13:47:55 You'd be quicker at it, but if you show me this interim thing you mentioned (which I have not been able to find), I'll take it on. 13:48:05 OK - we'll work it together. 13:48:25 apearson would be the quickest, but he's decided to be in Europe and be afk 13:48:26 #agreed thorst & efried to work the local2remote patch 13:48:27 #action efried and thorst work together to bring harmony to the CI system 13:48:49 Ok, so that would get applied as part of our existing patch path then 13:48:53 yuh 13:48:55 so that brings to second point 13:48:58 No new code required from esberglu so far 13:49:01 yuh 13:49:04 how are we going to do our CI for proper nova 13:49:10 I think this patch solves one aspect of it 13:49:15 #topic In-tree driver CI discussion 13:49:37 but the other was that I was planning on localdisk for in-tree. I think we need SSP at a minimum for CI 'harmony' 13:49:54 Yeah, a bit of a wrinkle 13:50:14 @thorst - so I don't have to read through a ton (yeah, I'm lazy), is there a short summary I can look at to help? 13:50:15 but...I think its not that awful? We could lead with SSP... 13:50:32 apearson: don't worry about it - I was just poking on how you're supposedly afk 13:50:53 oh fine - I know when I'm not wanted... 13:51:03 #link https://review.openstack.org/#/c/381772/ <-- Driver blueprint 13:51:38 thorst: would we really lead with SSP only? 13:51:57 I think we'd also want localdisk in the mix 13:52:17 adreznec: I think we throw both in 13:52:23 see what sticks 13:52:23 That would allow us to run the most basic case of the driver 13:52:25 Ok 13:52:26 but we probably develop SSP first. 13:52:30 There's a matter of staging, in any case. We would probably want to ... yeah. 13:52:31 so that we get CI running ASAP 13:53:01 #agreed on including both localdisk and SSP in the first pass of the in-tree driver 13:53:37 alright...amazing. We have a plan on those. 13:53:46 Yep 13:53:49 #action thorst to update powervm blueprint 13:53:57 does that actually do anything? 13:54:03 Are there any other things we need to decide in the blueprint? 13:54:10 It should in the meeting minutes 13:54:12 not sure...probably, but I haven't looked. 13:54:13 I think it makes stuff appear in a different font in the meeting minutes. 13:54:14 (if we get meeting minutes) 13:54:21 well, haven't looked in depth. E-mail hell. 13:54:34 Looking at comments now 13:54:56 First one up was about an overview of old powervm vs powervc vs powerkvm vs new powervm driver 13:55:19 yeah, I can put that stuff in. None of this is really heart burn. 13:55:23 I think a couple lines on that is fine 13:55:50 I responded to a few of the comments with links to the WIP change sets. 13:56:00 and some we already discussed in unconference...so I think we're really OK here. 13:56:09 Yeah 13:56:10 Ok 13:56:14 next topic? 13:56:19 There's probably only three or four comments that need some nontrivial text added. 13:56:21 #topic Next steps on stabilizing CI 13:56:29 I have a couple things for that 13:56:36 So esberglu once we land the updated local2remote patch, what's next? 13:57:02 adreznec: I just saw your comment about disabling stable/mitaka runs. stable/mitaka is not compat. with 16.04, which we have now moved to 13:57:15 Ah, right 13:57:48 Ok, I think I'm ok with dropping that from CI runs... 13:58:13 But also I think there is another issue. The run where we discover the above remote thing only took 1 hour. Some are still taking forever / timing out 13:58:17 We'd be stuck with it through the next cycle without CI going, but... I'm not sure it's a big deal 13:58:59 I think there is a devstack config option to force runs even though devstack hasn't been tested on 16.04 13:59:20 Yeah 13:59:21 If we want to try that on staging at some point and see what happens 13:59:25 You can always force the run 13:59:36 adreznec esberglu: we need 16.04 because OSA, right? 13:59:40 Yeah 13:59:45 Not sure it's worth the headache down the road 13:59:49 Well and for Ocata 14:00:00 Ocata isn't going to support trusty for most projects by the end of the cycle 14:00:41 So we'd be here in a month or two anyway 14:01:15 OK - yeah, I'm OK with that. 14:01:22 unfortunate, but OK. 14:01:28 can't do something like that when in tree... 14:01:29 So the timeouts appear to be related to our multiplexed image upload algorithm. 14:01:39 Yeah 14:01:50 efried: when you say multiplexed... 14:01:51 We need a deeper debug (I'm probably on the hook for that); but I think a broader design discussion may be in order. 14:01:57 do you mean my code or your marker lu thing? 14:01:58 We'll need to figure out handling multiple image "flavors" for different branches down the road 14:02:07 probably the wrong term. I mean the marker LU thing. 14:02:25 Fortunately we have ~2 years to figure that out, probably 14:02:25 efried: and how much of that is due to marker lu or the fact that the file never actually uploads (my thing) 14:03:01 thorst, you mean the thing that _just_ happened? Not related. The marker-based upload stuff behaves properly in that scenario. 14:03:07 Which is why this is kinda bizarre. 14:03:16 adreznec: That multiple flavor thing will be a piece of cake once zuul v3 comes out 14:03:19 It should be behaving the same on any other kind of failure. 14:03:23 Yep 14:03:40 That's why I don't think it's worth chasing now 14:03:54 When we get more complex config (static nodes, etc) with zuulv3 14:03:58 so revisit when we have things a bit more stable (patch landed) 14:04:49 ready to move onto the issues that wangqwsh is hitting? 14:04:54 Ok, we'll need to have a deeper dive into this once we land the local2remote stuff 14:04:59 Wait 14:05:08 aren't we still discussing the upload hangs? 14:05:12 * thorst waiting... 14:05:20 Yes 14:05:50 1) I wonder if we need to move the marker LU *creation* inside the try/finally; 2) I wonder if we somehow need to handle the scenario where deleting the marker LU fails; but most profoundly, 3) should we consider a timeout of some kind, where I can delete a marker LU I didn't create if a certain amount of time has elapsed? (scary) 14:06:37 It's possible 3a) we can detect whether the real image LU hasn't been created for "a while" and act then. 14:06:39 a timeout scares the hell out of me 14:06:55 a timeout where we see no progress being made doesn't scare me 14:06:57 Yeah, there's no way we can really set expectation for the speed of the actual upload. 14:07:07 well, if we see any bytes moving...then ok 14:07:12 but do we even get that visibility? 14:07:19 Yeah, I don't know if there's a way to detect how much of the upload has happened. 14:08:10 Do we really have visibility into the rate of data happening in the upload? 14:08:43 The schema doesn't provide anything but the capacity as far as the LU itself is concerned. 14:09:03 And remember, the whole point is that the upload is happening from a different nvl that we can't talk to (except through the SSP). 14:09:29 So... we could theoretically use the marker LU as a message bus. This gets pretty complicated. 14:09:50 Have the owner of the marker LU write heartbeats of some kind, and the other guys read the heartbeats. 14:10:07 Now we have clock sync problems and everything; but we can get around that. 14:10:13 can you see a last touched thing? 14:10:28 get a time that the marker was last touched and have the one uploading actually touch the marker 14:10:49 Not via REST. Would at the very least have to map & mount it. 14:10:52 though, we get into the same lock contention we'd be in otherwise 14:10:55 whoa, no mounts 14:11:11 Ew ew ew 14:11:16 Maybe not mount. 14:11:25 What metadata does linux provide on a mapped device? 14:11:54 So yeah, not map, but read. 14:12:16 Basically have raw, dd-able data written by the marker owner, read by the waiters. 14:12:17 should we table that for further discussion? I want to make sure we get to wangqwsh's item because it is late for him 14:12:25 we can swing back to it? 14:12:26 sure. 14:12:33 Ok 14:12:37 If I propose a patch for 1 & 2... 14:12:39 esberglu: any other CI stabilizing topics? 14:12:44 Can it be tested without merging it? 14:12:50 efried: that would probably be a good place to start discussion 14:12:53 and I think we could? 14:12:55 k 14:13:19 I think thats it 14:13:44 Ok 14:14:00 #topic OpenStack-Ansible CI bring-up 14:14:09 wangqwsh: thorst the floor is yours 14:14:18 Oh, right 14:14:34 #action efried to start proposing discussion patches on marker LU enhancements 14:14:40 As you were 14:14:45 (#1 is kinda dead in the water, alas) 14:15:48 alright. wangqwsh I think you were seeing odd Configparser import issues due to the use of the local2remote patch in your OSA CI 14:15:59 as of last night when we discussed, we didn't really have a plan... 14:16:14 I think we have two options 14:16:29 wangqwsh: did you make any progress on it or is that still the latest? 14:16:43 no progress... 14:16:43 Either install configparser into the nova-master venv for the compute node 14:16:54 Or make it so the local2remote patch doesn't require configparser 14:17:02 let me look at that patch... 14:17:28 ewww...the patch has a tab in it! 14:17:44 repo container builds the wheels. 14:18:13 wangqwsh: that doesn't really matter, we could patch it in post-build 14:18:28 adreznec efried: I feel like ConfigParser could easily be removed...for something more trivial. Though its probably a few hours of work. 14:19:03 are we using the 'setup.ini'? 14:19:05 Hmm ok 14:19:09 Let me look at the patch 14:19:11 sorry...pypowervm.ini 14:20:13 I could rewrite the confit parsing if I had to. not trivial. 14:20:30 yeah, but are we even using it... 14:20:36 it looks like it could fall back to nothing 14:20:56 other than in the patch? I don't think so. 14:21:32 ugh, and do the discovery every time, which is slow. 14:21:41 efried_otm: yeah, but 14:21:47 discovery when you start the adapter. 14:21:47 but yeah, that's the east path. 14:21:51 which is once. 14:21:59 maybe twice. 14:22:04 easy* 14:23:30 Hmm 14:23:37 flip side... 14:23:45 how hard is it to add that dependency to the container? 14:24:00 adreznec / esberglu? 14:24:10 Lets see 14:24:20 So the actual action of adding it to the container is really easy, right? 14:24:48 The path is consistent, so we'd source /path/to/nova/venv/bin/activate 14:25:05 and pip install configparser == v.whatever there 14:25:11 timing might be more complicated 14:26:10 wangqwsh: at what point in the run were you seeing the failure? 14:26:49 Would it be enough to let the OSA AIO finish running, then before we kick tempest patch configparser into the venv, restart nova-compute, validate it comes up, then do the tempest run? 14:26:50 start the nova-compute service, it printed 14:27:16 but will OSA be OK if nova-compute just dies 14:27:23 I think so 14:27:28 Easy to test locally 14:27:28 yes 14:27:42 I'll just break my driver settings, kick off an AIO and find out :) 14:27:55 I think it will though 14:28:10 I don't think it checks service state for long enough to notice the failure 14:28:32 Ok, so 2 minutes left here 14:29:22 wangqwsh: do you want to try patching configparser into the venv and seeing if nova-compute works? 14:29:53 Should just need to run "source /openstack/venvs/nova-master/bin/activate" and "pip install configparser" 14:29:58 how to install the pkg? via pip? 14:29:59 The restart nova-compute 14:30:16 I'll test the driver breakage situation on my AIO 14:30:20 To see if that timing would work 14:30:26 the pip config was changed to repo containter. 14:30:37 Ah, and configparser isn't there? 14:30:50 pip install would not find the pkg. 14:30:50 yes 14:30:55 Hmm ok 14:31:06 That's inconvenient 14:31:38 I'm wondering if maybe we try to remove the dependency in pypowervm... Maybe wangqwsh could drive that and efried could review? 14:31:47 just seems...simpler... 14:31:48 I wonder if we could just patch in configparser as an additional dependency for nova-powervm in CI runs 14:32:10 and then it would just end up in the venv 14:32:54 repo container builds the wheels using openstack requirement files. 14:33:30 wangqwsh: want to try those two approaches? 1) Change the nova-powervm requirements to include Configparser (I find that eww) and 2) work on the local2remote patch with efried to see how to remove that dependency 14:34:10 ok 14:34:10 wangqwsh: right, we could basically add configparser to the list of requirements needed for nova-powervm, but only for CI runs 14:34:17 Sure 14:34:23 I'll take #1 14:34:38 #action adreznec to test patching nova-powervm requirements to include configparser in OSA CI runs 14:34:58 #action wangqwsh and efried to evaluate removing configparser dependency from local2remote patch 14:35:17 #topic Future meetings 14:35:25 So I think this has been pretty productive 14:35:27 I liked this. We should do it again 14:35:34 What do you guys think about doing this weekly 14:35:39 I can get something scheduled 14:35:41 +2 14:35:51 efried: esberglu wangqwsh ^^ ?? 14:35:57 we should get a wiki out there too, like the nova meetings. 14:36:01 Right 14:36:04 that was my plan 14:36:10 Formalize this as a driver meeting 14:36:19 Yeah I think this was better than phone calls 14:36:33 Plus there is now a chat history 14:36:34 Cool 14:36:35 if the #1 not work, i can try #2 14:36:35 sure 14:36:35 1 question: 14:37:05 Does this time slot work for people? 14:37:14 works for me 14:37:24 unless one of us has to SDB present... 14:37:32 Right 14:37:36 hscipaddess issue 14:37:39 Which would be an issue in 2 weeks 14:37:48 Ok, I'll look at calendars 14:39:19 thorst: do you mean the hscipaddress works for you? 14:39:49 wangqwsh: I think that will go away with the ConfigParser dependency 14:39:59 I only saw that error once, so I think it was an anomoly 14:40:41 Yeah 14:40:45 I think that was a timing issue 14:40:54 ok, 14:41:02 i will try it again 14:41:16 #action adreznec to schedule weekly driver team meeting 14:41:28 All right, I think we're done here? 14:41:32 And we're over time 14:41:37 Thanks everyone! 14:41:39 damn...almost made it 14:41:41 thx! 14:41:43 #endmeeting