13:33:44 <adreznec> #startmeeting CI Scrum 13:33:45 <openstack> Meeting started Wed Nov 2 13:33:44 2016 UTC and is due to finish in 60 minutes. The chair is adreznec. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:33:45 <efried> With good intentions, of course. 13:33:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:33:49 <openstack> The meeting name has been set to 'ci_scrum' 13:33:55 <thorst> whoa. 13:33:59 <adreznec> #topic Overview of current status 13:34:01 <efried> noyce. 13:34:13 <adreznec> Hmm 13:34:34 <adreznec> Anyway 13:34:51 <adreznec> I'll turn the floor over to thorst 13:34:53 <thorst> OK - so the CI runs on Tempest VMs. They run remote to the NovaLink. So the fact that I made them with a file path is what was broken 13:35:20 <thorst> unfortunately we didn't catch cause CI was down, and now we can't get the CI running until we get a fix in 13:35:31 <thorst> here are my thoughts... 13:36:00 <thorst> we could add a config option to go back to old way. I think for the image size its fine. We needed this new path for when people like kriskend and seroyer deploys twenty 20-gig images 13:36:11 <thorst> but our CI is doing very small images, and most are linked clones (though not all) 13:36:28 <efried> So a secret config option or a public one? 13:36:39 <efried> (I'm down with the idea, btw. 13:36:40 <efried> ) 13:36:47 <adreznec> We're open source, hard for it to be a secret config option :P 13:36:47 <thorst> efried: I think it should be public 13:37:00 <esberglu> thorst: We are moving to large images once we add in the OSA stuff 13:37:02 <thorst> and I think that we need it public for our new in tree driver too. 13:37:19 <thorst> esberglu: larger for the under cloud, but not the VMs that tempest deploys 13:37:26 <efried> So we're talking about reinstating the IterableToFileAdapter and all that 13:37:30 <thorst> the under cloud would actually use the new path...because it is running on the novalink 13:37:40 <esberglu> No larger for the vms. OSA nodes need 50G free space 13:37:45 <thorst> efried: well, I replaced it with a ChunkyFileIter in an old path... 13:38:07 <efried> in an old patch on the same change set? 13:38:10 <adreznec> esberglu: yeah, but not for the guest VMs deployed as part of the CI run 13:38:11 <esberglu> NM i'm dumb 13:38:11 <thorst> esberglu: run...but that's all VMs that the under cloud provisions, not what the Tempest running in the VM provisions 13:38:25 <esberglu> Yeah I was confused by what you meant 13:38:26 <thorst> run -> right 13:38:28 <esberglu> I get it now 13:38:52 <efried> Nope. In an abandoned change set? 13:39:02 * adreznec can't help but laugh every time he sees ChunkFileIter 13:39:06 <adreznec> *Chunky 13:39:13 <thorst> efried: yeah, its actually a previous version of what merged 13:39:22 <thorst> but what I'm not sure about is what to call this opt 13:39:28 <thorst> "compat_upload_mode" or something 13:39:32 <adreznec> Anyway. So basically this configopt would provide a way to revert back to that not-quite-as-old behavior 13:40:08 <thorst> right. 13:40:10 <adreznec> I guess it depends on whether we want this configopt to be specific for toggling to the old upload function 13:40:28 <adreznec> Or if we want it to be a development-only configopt for allowing the driver to work remotely in the future 13:40:37 <thorst> hmmm 13:40:39 <thorst> that's kinda neat. 13:40:44 <adreznec> Where we could have other things this configopt "fixes" 13:40:56 <thorst> but whatever it is, it will need to persist into the future in-tree driver too 13:41:00 <adreznec> Something like "remote_driver_dev" 13:41:30 <thorst> adreznec: slippery slope...next thing you know we'll have config opts for remote ips 13:41:33 <thorst> and look like a remote driver. 13:41:37 <efried> Yeah, that's what I meant by a "private" option - basically either undocumented, or documented as "don't use this unless you're us" 13:41:57 <adreznec> thorst: yeah 13:42:00 <adreznec> Idk 13:42:04 <thorst> efried adreznec: another idea... 13:42:13 <thorst> is there a way we could ... hide this somehow? 13:42:24 <thorst> put a pypowervm patch in that says 'no, you don't get to do that' 13:42:34 <efried> I was looking at it, and I couldn't see a way to do it easily. 13:42:53 <thorst> we'd have to read in from the file path...and then pipe that into the REST layer 13:43:07 <efried> Or we put the IterableToFileAdapter (or the artist formerly known as) into pypowervm. 13:43:21 <thorst> efried: well, really a FileToIterableAdapter 13:43:24 <efried> And then pypowervm detects remote and overrides the specified option, ignoring the function. 13:43:40 <thorst> well, you need the function...cause that's the only interlock into glance 13:44:08 <thorst> I kinda prefer that cause we need to patch the pypowervm in OSA already... 13:44:16 <thorst> and in our devstack... 13:44:31 <thorst> its trickier but I think it prevents slipperiness 13:44:38 <efried> oh, you want this as part of local2remote patch, not a permanent fixture in pypowervm? 13:44:44 <efried> I guess that makes sense. 13:44:45 <thorst> efried: right. 13:44:52 <adreznec> Yeah 13:45:21 <adreznec> Basically this would be another library tweak for CI 13:45:24 <thorst> those were only two ways I could think of it... Config opt or hide it in pypowervm local2remote. I prefer the second cause it will just work with everything and limit it to our CI 13:45:43 <efried> btw, does local2remote become moot if we decide to support remote pypowervm officially? 13:45:50 <thorst> efried: nah 13:46:02 <thorst> because its really a question of whether or not we support nova-powervm remotely 13:46:15 <thorst> which we don't, except for CI (to allow scale) 13:46:42 <adreznec> Right 13:46:56 <efried> Okay, so presumably... 13:46:56 <efried> #action thorst to propose the local2remote patch to make this work 13:46:56 <efried> ? 13:47:01 <adreznec> And I can't see a compelling reason to want to outside development... kind of defeats the purpose of the driver 13:47:14 <thorst> can we swap the owner to efried? 13:47:20 <thorst> cause I want a different action later in meeting :-D 13:47:51 <thorst> (I want to spend time updating the nova-powervm proposal rst) 13:47:55 <efried> You'd be quicker at it, but if you show me this interim thing you mentioned (which I have not been able to find), I'll take it on. 13:48:05 <thorst> OK - we'll work it together. 13:48:25 <thorst> apearson would be the quickest, but he's decided to be in Europe and be afk 13:48:26 <efried> #agreed thorst & efried to work the local2remote patch 13:48:27 <adreznec> #action efried and thorst work together to bring harmony to the CI system 13:48:49 <adreznec> Ok, so that would get applied as part of our existing patch path then 13:48:53 <efried> yuh 13:48:55 <thorst> so that brings to second point 13:48:58 <adreznec> No new code required from esberglu so far 13:49:01 <efried> yuh 13:49:04 <thorst> how are we going to do our CI for proper nova 13:49:10 <thorst> I think this patch solves one aspect of it 13:49:15 <adreznec> #topic In-tree driver CI discussion 13:49:37 <thorst> but the other was that I was planning on localdisk for in-tree. I think we need SSP at a minimum for CI 'harmony' 13:49:54 <adreznec> Yeah, a bit of a wrinkle 13:50:14 <apearson> @thorst - so I don't have to read through a ton (yeah, I'm lazy), is there a short summary I can look at to help? 13:50:15 <thorst> but...I think its not that awful? We could lead with SSP... 13:50:32 <thorst> apearson: don't worry about it - I was just poking on how you're supposedly afk 13:50:53 <apearson> oh fine - I know when I'm not wanted... 13:51:03 <adreznec> #link https://review.openstack.org/#/c/381772/ <-- Driver blueprint 13:51:38 <adreznec> thorst: would we really lead with SSP only? 13:51:57 <adreznec> I think we'd also want localdisk in the mix 13:52:17 <thorst> adreznec: I think we throw both in 13:52:23 <thorst> see what sticks 13:52:23 <adreznec> That would allow us to run the most basic case of the driver 13:52:25 <adreznec> Ok 13:52:26 <thorst> but we probably develop SSP first. 13:52:30 <efried> There's a matter of staging, in any case. We would probably want to ... yeah. 13:52:31 <thorst> so that we get CI running ASAP 13:53:01 <adreznec> #agreed on including both localdisk and SSP in the first pass of the in-tree driver 13:53:37 <thorst> alright...amazing. We have a plan on those. 13:53:46 <adreznec> Yep 13:53:49 <thorst> #action thorst to update powervm blueprint 13:53:57 <thorst> does that actually do anything? 13:54:03 <adreznec> Are there any other things we need to decide in the blueprint? 13:54:10 <adreznec> It should in the meeting minutes 13:54:12 <thorst> not sure...probably, but I haven't looked. 13:54:13 <efried> I think it makes stuff appear in a different font in the meeting minutes. 13:54:14 <adreznec> (if we get meeting minutes) 13:54:21 <thorst> well, haven't looked in depth. E-mail hell. 13:54:34 <adreznec> Looking at comments now 13:54:56 <adreznec> First one up was about an overview of old powervm vs powervc vs powerkvm vs new powervm driver 13:55:19 <thorst> yeah, I can put that stuff in. None of this is really heart burn. 13:55:23 <adreznec> I think a couple lines on that is fine 13:55:50 <efried> I responded to a few of the comments with links to the WIP change sets. 13:56:00 <thorst> and some we already discussed in unconference...so I think we're really OK here. 13:56:09 <adreznec> Yeah 13:56:10 <adreznec> Ok 13:56:14 <thorst> next topic? 13:56:19 <efried> There's probably only three or four comments that need some nontrivial text added. 13:56:21 <adreznec> #topic Next steps on stabilizing CI 13:56:29 <esberglu> I have a couple things for that 13:56:36 <adreznec> So esberglu once we land the updated local2remote patch, what's next? 13:57:02 <esberglu> adreznec: I just saw your comment about disabling stable/mitaka runs. stable/mitaka is not compat. with 16.04, which we have now moved to 13:57:15 <adreznec> Ah, right 13:57:48 <adreznec> Ok, I think I'm ok with dropping that from CI runs... 13:58:13 <esberglu> But also I think there is another issue. The run where we discover the above remote thing only took 1 hour. Some are still taking forever / timing out 13:58:17 <adreznec> We'd be stuck with it through the next cycle without CI going, but... I'm not sure it's a big deal 13:58:59 <esberglu> I think there is a devstack config option to force runs even though devstack hasn't been tested on 16.04 13:59:20 <adreznec> Yeah 13:59:21 <esberglu> If we want to try that on staging at some point and see what happens 13:59:25 <adreznec> You can always force the run 13:59:36 <thorst> adreznec esberglu: we need 16.04 because OSA, right? 13:59:40 <esberglu> Yeah 13:59:45 <adreznec> Not sure it's worth the headache down the road 13:59:49 <adreznec> Well and for Ocata 14:00:00 <adreznec> Ocata isn't going to support trusty for most projects by the end of the cycle 14:00:41 <adreznec> So we'd be here in a month or two anyway 14:01:15 <thorst> OK - yeah, I'm OK with that. 14:01:22 <thorst> unfortunate, but OK. 14:01:28 <thorst> can't do something like that when in tree... 14:01:29 <efried> So the timeouts appear to be related to our multiplexed image upload algorithm. 14:01:39 <adreznec> Yeah 14:01:50 <thorst> efried: when you say multiplexed... 14:01:51 <efried> We need a deeper debug (I'm probably on the hook for that); but I think a broader design discussion may be in order. 14:01:57 <thorst> do you mean my code or your marker lu thing? 14:01:58 <adreznec> We'll need to figure out handling multiple image "flavors" for different branches down the road 14:02:07 <efried> probably the wrong term. I mean the marker LU thing. 14:02:25 <adreznec> Fortunately we have ~2 years to figure that out, probably 14:02:25 <thorst> efried: and how much of that is due to marker lu or the fact that the file never actually uploads (my thing) 14:03:01 <efried> thorst, you mean the thing that _just_ happened? Not related. The marker-based upload stuff behaves properly in that scenario. 14:03:07 <efried> Which is why this is kinda bizarre. 14:03:16 <esberglu> adreznec: That multiple flavor thing will be a piece of cake once zuul v3 comes out 14:03:19 <efried> It should be behaving the same on any other kind of failure. 14:03:23 <adreznec> Yep 14:03:40 <adreznec> That's why I don't think it's worth chasing now 14:03:54 <adreznec> When we get more complex config (static nodes, etc) with zuulv3 14:03:58 <thorst> so revisit when we have things a bit more stable (patch landed) 14:04:49 <thorst> ready to move onto the issues that wangqwsh is hitting? 14:04:54 <adreznec> Ok, we'll need to have a deeper dive into this once we land the local2remote stuff 14:04:59 <efried> Wait 14:05:08 <efried> aren't we still discussing the upload hangs? 14:05:12 * thorst waiting... 14:05:20 <adreznec> Yes 14:05:50 <efried> 1) I wonder if we need to move the marker LU *creation* inside the try/finally; 2) I wonder if we somehow need to handle the scenario where deleting the marker LU fails; but most profoundly, 3) should we consider a timeout of some kind, where I can delete a marker LU I didn't create if a certain amount of time has elapsed? (scary) 14:06:37 <efried> It's possible 3a) we can detect whether the real image LU hasn't been created for "a while" and act then. 14:06:39 <thorst> a timeout scares the hell out of me 14:06:55 <thorst> a timeout where we see no progress being made doesn't scare me 14:06:57 <efried> Yeah, there's no way we can really set expectation for the speed of the actual upload. 14:07:07 <thorst> well, if we see any bytes moving...then ok 14:07:12 <thorst> but do we even get that visibility? 14:07:19 <efried> Yeah, I don't know if there's a way to detect how much of the upload has happened. 14:08:10 <adreznec> Do we really have visibility into the rate of data happening in the upload? 14:08:43 <efried> The schema doesn't provide anything but the capacity as far as the LU itself is concerned. 14:09:03 <efried> And remember, the whole point is that the upload is happening from a different nvl that we can't talk to (except through the SSP). 14:09:29 <efried> So... we could theoretically use the marker LU as a message bus. This gets pretty complicated. 14:09:50 <efried> Have the owner of the marker LU write heartbeats of some kind, and the other guys read the heartbeats. 14:10:07 <efried> Now we have clock sync problems and everything; but we can get around that. 14:10:13 <thorst> can you see a last touched thing? 14:10:28 <thorst> get a time that the marker was last touched and have the one uploading actually touch the marker 14:10:49 <efried> Not via REST. Would at the very least have to map & mount it. 14:10:52 <thorst> though, we get into the same lock contention we'd be in otherwise 14:10:55 <thorst> whoa, no mounts 14:11:11 <adreznec> Ew ew ew 14:11:16 <efried> Maybe not mount. 14:11:25 <efried> What metadata does linux provide on a mapped device? 14:11:54 <efried> So yeah, not map, but read. 14:12:16 <efried> Basically have raw, dd-able data written by the marker owner, read by the waiters. 14:12:17 <thorst> should we table that for further discussion? I want to make sure we get to wangqwsh's item because it is late for him 14:12:25 <thorst> we can swing back to it? 14:12:26 <efried> sure. 14:12:33 <adreznec> Ok 14:12:37 <efried> If I propose a patch for 1 & 2... 14:12:39 <adreznec> esberglu: any other CI stabilizing topics? 14:12:44 <efried> Can it be tested without merging it? 14:12:50 <adreznec> efried: that would probably be a good place to start discussion 14:12:53 <adreznec> and I think we could? 14:12:55 <efried> k 14:13:19 <esberglu> I think thats it 14:13:44 <adreznec> Ok 14:14:00 <adreznec> #topic OpenStack-Ansible CI bring-up 14:14:09 <adreznec> wangqwsh: thorst the floor is yours 14:14:18 <adreznec> Oh, right 14:14:34 <adreznec> #action efried to start proposing discussion patches on marker LU enhancements 14:14:40 <adreznec> As you were 14:14:45 <efried> (#1 is kinda dead in the water, alas) 14:15:48 <thorst> alright. wangqwsh I think you were seeing odd Configparser import issues due to the use of the local2remote patch in your OSA CI 14:15:59 <thorst> as of last night when we discussed, we didn't really have a plan... 14:16:14 <adreznec> I think we have two options 14:16:29 <thorst> wangqwsh: did you make any progress on it or is that still the latest? 14:16:43 <wangqwsh> no progress... 14:16:43 <adreznec> Either install configparser into the nova-master venv for the compute node 14:16:54 <adreznec> Or make it so the local2remote patch doesn't require configparser 14:17:02 <thorst> let me look at that patch... 14:17:28 <thorst> ewww...the patch has a tab in it! 14:17:44 <wangqwsh> repo container builds the wheels. 14:18:13 <adreznec> wangqwsh: that doesn't really matter, we could patch it in post-build 14:18:28 <thorst> adreznec efried: I feel like ConfigParser could easily be removed...for something more trivial. Though its probably a few hours of work. 14:19:03 <thorst> are we using the 'setup.ini'? 14:19:05 <adreznec> Hmm ok 14:19:09 <adreznec> Let me look at the patch 14:19:11 <thorst> sorry...pypowervm.ini 14:20:13 <efried_otm> I could rewrite the confit parsing if I had to. not trivial. 14:20:30 <thorst> yeah, but are we even using it... 14:20:36 <thorst> it looks like it could fall back to nothing 14:20:56 <efried_otm> other than in the patch? I don't think so. 14:21:32 <efried_otm> ugh, and do the discovery every time, which is slow. 14:21:41 <thorst> efried_otm: yeah, but 14:21:47 <thorst> discovery when you start the adapter. 14:21:47 <efried_otm> but yeah, that's the east path. 14:21:51 <thorst> which is once. 14:21:59 <thorst> maybe twice. 14:22:04 <efried_otm> easy* 14:23:30 <adreznec> Hmm 14:23:37 <thorst> flip side... 14:23:45 <thorst> how hard is it to add that dependency to the container? 14:24:00 <thorst> adreznec / esberglu? 14:24:10 <adreznec> Lets see 14:24:20 <adreznec> So the actual action of adding it to the container is really easy, right? 14:24:48 <adreznec> The path is consistent, so we'd source /path/to/nova/venv/bin/activate 14:25:05 <adreznec> and pip install configparser == v.whatever there 14:25:11 <adreznec> timing might be more complicated 14:26:10 <adreznec> wangqwsh: at what point in the run were you seeing the failure? 14:26:49 <adreznec> Would it be enough to let the OSA AIO finish running, then before we kick tempest patch configparser into the venv, restart nova-compute, validate it comes up, then do the tempest run? 14:26:50 <wangqwsh> start the nova-compute service, it printed 14:27:16 <thorst> but will OSA be OK if nova-compute just dies 14:27:23 <adreznec> I think so 14:27:28 <adreznec> Easy to test locally 14:27:28 <wangqwsh> yes 14:27:42 <adreznec> I'll just break my driver settings, kick off an AIO and find out :) 14:27:55 <adreznec> I think it will though 14:28:10 <adreznec> I don't think it checks service state for long enough to notice the failure 14:28:32 <adreznec> Ok, so 2 minutes left here 14:29:22 <adreznec> wangqwsh: do you want to try patching configparser into the venv and seeing if nova-compute works? 14:29:53 <adreznec> Should just need to run "source /openstack/venvs/nova-master/bin/activate" and "pip install configparser" 14:29:58 <wangqwsh> how to install the pkg? via pip? 14:29:59 <adreznec> The restart nova-compute 14:30:16 <adreznec> I'll test the driver breakage situation on my AIO 14:30:20 <adreznec> To see if that timing would work 14:30:26 <wangqwsh> the pip config was changed to repo containter. 14:30:37 <adreznec> Ah, and configparser isn't there? 14:30:50 <wangqwsh> pip install would not find the pkg. 14:30:50 <wangqwsh> yes 14:30:55 <adreznec> Hmm ok 14:31:06 <adreznec> That's inconvenient 14:31:38 <thorst> I'm wondering if maybe we try to remove the dependency in pypowervm... Maybe wangqwsh could drive that and efried could review? 14:31:47 <thorst> just seems...simpler... 14:31:48 <adreznec> I wonder if we could just patch in configparser as an additional dependency for nova-powervm in CI runs 14:32:10 <adreznec> and then it would just end up in the venv 14:32:54 <wangqwsh> repo container builds the wheels using openstack requirement files. 14:33:30 <thorst> wangqwsh: want to try those two approaches? 1) Change the nova-powervm requirements to include Configparser (I find that eww) and 2) work on the local2remote patch with efried to see how to remove that dependency 14:34:10 <wangqwsh> ok 14:34:10 <adreznec> wangqwsh: right, we could basically add configparser to the list of requirements needed for nova-powervm, but only for CI runs 14:34:17 <adreznec> Sure 14:34:23 <adreznec> I'll take #1 14:34:38 <adreznec> #action adreznec to test patching nova-powervm requirements to include configparser in OSA CI runs 14:34:58 <adreznec> #action wangqwsh and efried to evaluate removing configparser dependency from local2remote patch 14:35:17 <adreznec> #topic Future meetings 14:35:25 <adreznec> So I think this has been pretty productive 14:35:27 <thorst> I liked this. We should do it again 14:35:34 <adreznec> What do you guys think about doing this weekly 14:35:39 <adreznec> I can get something scheduled 14:35:41 <thorst> +2 14:35:51 <adreznec> efried: esberglu wangqwsh ^^ ?? 14:35:57 <thorst> we should get a wiki out there too, like the nova meetings. 14:36:01 <adreznec> Right 14:36:04 <adreznec> that was my plan 14:36:10 <adreznec> Formalize this as a driver meeting 14:36:19 <esberglu> Yeah I think this was better than phone calls 14:36:33 <esberglu> Plus there is now a chat history 14:36:34 <adreznec> Cool 14:36:35 <wangqwsh> if the #1 not work, i can try #2 14:36:35 <wangqwsh> sure 14:36:35 <wangqwsh> 1 question: 14:37:05 <adreznec> Does this time slot work for people? 14:37:14 <thorst> works for me 14:37:24 <thorst> unless one of us has to SDB present... 14:37:32 <adreznec> Right 14:37:36 <wangqwsh> hscipaddess issue 14:37:39 <adreznec> Which would be an issue in 2 weeks 14:37:48 <adreznec> Ok, I'll look at calendars 14:39:19 <wangqwsh> thorst: do you mean the hscipaddress works for you? 14:39:49 <thorst> wangqwsh: I think that will go away with the ConfigParser dependency 14:39:59 <thorst> I only saw that error once, so I think it was an anomoly 14:40:41 <adreznec> Yeah 14:40:45 <adreznec> I think that was a timing issue 14:40:54 <wangqwsh> ok, 14:41:02 <wangqwsh> i will try it again 14:41:16 <adreznec> #action adreznec to schedule weekly driver team meeting 14:41:28 <adreznec> All right, I think we're done here? 14:41:32 <adreznec> And we're over time 14:41:37 <adreznec> Thanks everyone! 14:41:39 <thorst> damn...almost made it 14:41:41 <thorst> thx! 14:41:43 <adreznec> #endmeeting