#openstack-powervm log

13:33:44 <adreznec> #startmeeting CI Scrum
13:33:45 <openstack> Meeting started Wed Nov  2 13:33:44 2016 UTC and is due to finish in 60 minutes.  The chair is adreznec. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:33:45 <efried> With good intentions, of course.
13:33:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:33:49 <openstack> The meeting name has been set to 'ci_scrum'
13:33:55 <thorst> whoa.
13:33:59 <adreznec> #topic Overview of current status
13:34:01 <efried> noyce.
13:34:13 <adreznec> Hmm
13:34:34 <adreznec> Anyway
13:34:51 <adreznec> I'll turn the floor over to thorst
13:34:53 <thorst> OK - so the CI runs on Tempest VMs.  They run remote to the NovaLink.  So the fact that I made them with a file path is what was broken
13:35:20 <thorst> unfortunately we didn't catch cause CI was down, and now we can't get the CI running until we get a fix in
13:35:31 <thorst> here are my thoughts...
13:36:00 <thorst> we could add a config option to go back to old way.  I think for the image size its fine.  We needed this new path for when people like kriskend and seroyer deploys twenty 20-gig images
13:36:11 <thorst> but our CI is doing very small images, and most are linked clones (though not all)
13:36:28 <efried> So a secret config option or a public one?
13:36:39 <efried> (I'm down with the idea, btw.
13:36:40 <efried> )
13:36:47 <adreznec> We're open source, hard for it to be a secret config option :P
13:36:47 <thorst> efried: I think it should be public
13:37:00 <esberglu> thorst: We are moving to large images once we add in the OSA stuff
13:37:02 <thorst> and I think that we need it public for our new in tree driver too.
13:37:19 <thorst> esberglu: larger for the under cloud, but not the VMs that tempest deploys
13:37:26 <efried> So we're talking about reinstating the IterableToFileAdapter and all that
13:37:30 <thorst> the under cloud would actually use the new path...because it is running on the novalink
13:37:40 <esberglu> No larger for the vms. OSA nodes need 50G free space
13:37:45 <thorst> efried: well, I replaced it with a ChunkyFileIter in an old path...
13:38:07 <efried> in an old patch on the same change set?
13:38:10 <adreznec> esberglu: yeah, but not for the guest VMs deployed as part of the CI run
13:38:11 <esberglu> NM i'm dumb
13:38:11 <thorst> esberglu: run...but that's all VMs that the under cloud provisions, not what the Tempest running in the VM provisions
13:38:25 <esberglu> Yeah I was confused by what you meant
13:38:26 <thorst> run -> right
13:38:28 <esberglu> I get it now
13:38:52 <efried> Nope.  In an abandoned change set?
13:39:02 * adreznec can't help but laugh every time he sees ChunkFileIter
13:39:06 <adreznec> *Chunky
13:39:13 <thorst> efried: yeah, its actually a previous version of what merged
13:39:22 <thorst> but what I'm not sure about is what to call this opt
13:39:28 <thorst> "compat_upload_mode" or something
13:39:32 <adreznec> Anyway. So basically this configopt would provide a way to revert back to that not-quite-as-old behavior
13:40:08 <thorst> right.
13:40:10 <adreznec> I guess it depends on whether we want this configopt to be specific for toggling to the old upload function
13:40:28 <adreznec> Or if we want it to be a development-only configopt for allowing the driver to work remotely in the future
13:40:37 <thorst> hmmm
13:40:39 <thorst> that's kinda neat.
13:40:44 <adreznec> Where we could have other things this configopt "fixes"
13:40:56 <thorst> but whatever it is, it will need to persist into the future in-tree driver too
13:41:00 <adreznec> Something like "remote_driver_dev"
13:41:30 <thorst> adreznec: slippery slope...next thing you know we'll have config opts for remote ips
13:41:33 <thorst> and look like a remote driver.
13:41:37 <efried> Yeah, that's what I meant by a "private" option - basically either undocumented, or documented as "don't use this unless you're us"
13:41:57 <adreznec> thorst: yeah
13:42:00 <adreznec> Idk
13:42:04 <thorst> efried adreznec: another idea...
13:42:13 <thorst> is there a way we could ... hide this somehow?
13:42:24 <thorst> put a pypowervm patch in that says 'no, you don't get to do that'
13:42:34 <efried> I was looking at it, and I couldn't see a way to do it easily.
13:42:53 <thorst> we'd have to read in from the file path...and then pipe that into the REST layer
13:43:07 <efried> Or we put the IterableToFileAdapter (or the artist formerly known as) into pypowervm.
13:43:21 <thorst> efried: well, really a FileToIterableAdapter
13:43:24 <efried> And then pypowervm detects remote and overrides the specified option, ignoring the function.
13:43:40 <thorst> well, you need the function...cause that's the only interlock into glance
13:44:08 <thorst> I kinda prefer that cause we need to patch the pypowervm in OSA already...
13:44:16 <thorst> and in our devstack...
13:44:31 <thorst> its trickier but I think it prevents slipperiness
13:44:38 <efried> oh, you want this as part of local2remote patch, not a permanent fixture in pypowervm?
13:44:44 <efried> I guess that makes sense.
13:44:45 <thorst> efried: right.
13:44:52 <adreznec> Yeah
13:45:21 <adreznec> Basically this would be another library tweak for CI
13:45:24 <thorst> those were only two ways I could think of it...  Config opt or hide it in pypowervm local2remote.  I prefer the second cause it will just work with everything and limit it to our CI
13:45:43 <efried> btw, does local2remote become moot if we decide to support remote pypowervm officially?
13:45:50 <thorst> efried: nah
13:46:02 <thorst> because its really a question of whether or not we support nova-powervm remotely
13:46:15 <thorst> which we don't, except for CI (to allow scale)
13:46:42 <adreznec> Right
13:46:56 <efried> Okay, so presumably...
13:46:56 <efried> #action thorst to propose the local2remote patch to make this work
13:46:56 <efried> ?
13:47:01 <adreznec> And I can't see a compelling reason to want to outside development... kind of defeats the purpose of the driver
13:47:14 <thorst> can we swap the owner to efried?
13:47:20 <thorst> cause I want a different action later in meeting  :-D
13:47:51 <thorst> (I want to spend time updating the nova-powervm proposal rst)
13:47:55 <efried> You'd be quicker at it, but if you show me this interim thing you mentioned (which I have not been able to find), I'll take it on.
13:48:05 <thorst> OK - we'll work it together.
13:48:25 <thorst> apearson would be the quickest, but he's decided to be in Europe and be afk
13:48:26 <efried> #agreed thorst & efried to work the local2remote patch
13:48:27 <adreznec> #action efried and thorst work together to bring harmony to the CI system
13:48:49 <adreznec> Ok, so that would get applied as part of our existing patch path then
13:48:53 <efried> yuh
13:48:55 <thorst> so that brings to second point
13:48:58 <adreznec> No new code required from esberglu so far
13:49:01 <efried> yuh
13:49:04 <thorst> how are we going to do our CI for proper nova
13:49:10 <thorst> I think this patch solves one aspect of it
13:49:15 <adreznec> #topic In-tree driver CI discussion
13:49:37 <thorst> but the other was that I was planning on localdisk for in-tree.  I think we need SSP at a minimum for CI 'harmony'
13:49:54 <adreznec> Yeah, a bit of a wrinkle
13:50:14 <apearson> @thorst - so I don't have to read through a ton (yeah, I'm lazy), is there a short summary I can look at to help?
13:50:15 <thorst> but...I think its not that awful?  We could lead with SSP...
13:50:32 <thorst> apearson: don't worry about it - I was just poking on how you're supposedly afk
13:50:53 <apearson> oh fine - I know when I'm not wanted...
13:51:03 <adreznec> #link https://review.openstack.org/#/c/381772/ <-- Driver blueprint
13:51:38 <adreznec> thorst: would we really lead with SSP only?
13:51:57 <adreznec> I think we'd also want localdisk in the mix
13:52:17 <thorst> adreznec: I think we throw both in
13:52:23 <thorst> see what sticks
13:52:23 <adreznec> That would allow us to run the most basic case of the driver
13:52:25 <adreznec> Ok
13:52:26 <thorst> but we probably develop SSP first.
13:52:30 <efried> There's a matter of staging, in any case.  We would probably want to ... yeah.
13:52:31 <thorst> so that we get CI running ASAP
13:53:01 <adreznec> #agreed on including both localdisk and SSP in the first pass of the in-tree driver
13:53:37 <thorst> alright...amazing.  We have a plan on those.
13:53:46 <adreznec> Yep
13:53:49 <thorst> #action thorst to update powervm blueprint
13:53:57 <thorst> does that actually do anything?
13:54:03 <adreznec> Are there any other things we need to decide in the blueprint?
13:54:10 <adreznec> It should in the meeting minutes
13:54:12 <thorst> not sure...probably, but I haven't looked.
13:54:13 <efried> I think it makes stuff appear in a different font in the meeting minutes.
13:54:14 <adreznec> (if we get meeting minutes)
13:54:21 <thorst> well, haven't looked in depth.  E-mail hell.
13:54:34 <adreznec> Looking at comments now
13:54:56 <adreznec> First one up was about an overview of old powervm vs powervc vs powerkvm vs new powervm driver
13:55:19 <thorst> yeah, I can put that stuff in.  None of this is really heart burn.
13:55:23 <adreznec> I think a couple lines on that is fine
13:55:50 <efried> I responded to a few of the comments with links to the WIP change sets.
13:56:00 <thorst> and some we already discussed in unconference...so I think we're really OK here.
13:56:09 <adreznec> Yeah
13:56:10 <adreznec> Ok
13:56:14 <thorst> next topic?
13:56:19 <efried> There's probably only three or four comments that need some nontrivial text added.
13:56:21 <adreznec> #topic Next steps on stabilizing CI
13:56:29 <esberglu> I have a couple things for that
13:56:36 <adreznec> So esberglu once we land the updated local2remote patch, what's next?
13:57:02 <esberglu> adreznec: I just saw your comment about disabling stable/mitaka runs. stable/mitaka is not compat. with 16.04, which we have now moved to
13:57:15 <adreznec> Ah, right
13:57:48 <adreznec> Ok, I think I'm ok with dropping that from CI runs...
13:58:13 <esberglu> But also I think there is another issue. The run where we discover the above remote thing only took 1 hour. Some are still taking forever / timing out
13:58:17 <adreznec> We'd be stuck with it through the next cycle without CI going, but... I'm not sure it's a big deal
13:58:59 <esberglu> I think there is a devstack config option to force runs even though devstack hasn't been tested on 16.04
13:59:20 <adreznec> Yeah
13:59:21 <esberglu> If we want to try that on staging at some point and see what happens
13:59:25 <adreznec> You can always force the run
13:59:36 <thorst> adreznec esberglu: we need 16.04 because OSA, right?
13:59:40 <esberglu> Yeah
13:59:45 <adreznec> Not sure it's worth the headache down the road
13:59:49 <adreznec> Well and for Ocata
14:00:00 <adreznec> Ocata isn't going to support trusty for most projects by the end of the cycle
14:00:41 <adreznec> So we'd be here in a month or two anyway
14:01:15 <thorst> OK - yeah, I'm OK with that.
14:01:22 <thorst> unfortunate, but OK.
14:01:28 <thorst> can't do something like that when in tree...
14:01:29 <efried> So the timeouts appear to be related to our multiplexed image upload algorithm.
14:01:39 <adreznec> Yeah
14:01:50 <thorst> efried: when you say multiplexed...
14:01:51 <efried> We need a deeper debug (I'm probably on the hook for that); but I think a broader design discussion may be in order.
14:01:57 <thorst> do you mean my code or your marker lu thing?
14:01:58 <adreznec> We'll need to figure out handling multiple image "flavors" for different branches down the road
14:02:07 <efried> probably the wrong term.  I mean the marker LU thing.
14:02:25 <adreznec> Fortunately we have ~2 years to figure that out, probably
14:02:25 <thorst> efried: and how much of that is due to marker lu or the fact that the file never actually uploads (my thing)
14:03:01 <efried> thorst, you mean the thing that _just_ happened?  Not related.  The marker-based upload stuff behaves properly in that scenario.
14:03:07 <efried> Which is why this is kinda bizarre.
14:03:16 <esberglu> adreznec: That multiple flavor thing will be a piece of cake once zuul v3 comes out
14:03:19 <efried> It should be behaving the same on any other kind of failure.
14:03:23 <adreznec> Yep
14:03:40 <adreznec> That's why I don't think it's worth chasing now
14:03:54 <adreznec> When we get more complex config (static nodes, etc) with zuulv3
14:03:58 <thorst> so revisit when we have things a bit more stable (patch landed)
14:04:49 <thorst> ready to move onto the issues that wangqwsh is hitting?
14:04:54 <adreznec> Ok, we'll need to have a deeper dive into this once we land the local2remote stuff
14:04:59 <efried> Wait
14:05:08 <efried> aren't we still discussing the upload hangs?
14:05:12 * thorst waiting...
14:05:20 <adreznec> Yes
14:05:50 <efried> 1) I wonder if we need to move the marker LU *creation* inside the try/finally; 2) I wonder if we somehow need to handle the scenario where deleting the marker LU fails; but most profoundly, 3) should we consider a timeout of some kind, where I can delete a marker LU I didn't create if a certain amount of time has elapsed? (scary)
14:06:37 <efried> It's possible 3a) we can detect whether the real image LU hasn't been created for "a while" and act then.
14:06:39 <thorst> a timeout scares the hell out of me
14:06:55 <thorst> a timeout where we see no progress being made doesn't scare me
14:06:57 <efried> Yeah, there's no way we can really set expectation for the speed of the actual upload.
14:07:07 <thorst> well, if we see any bytes moving...then ok
14:07:12 <thorst> but do we even get that visibility?
14:07:19 <efried> Yeah, I don't know if there's a way to detect how much of the upload has happened.
14:08:10 <adreznec> Do we really have visibility into the rate of data happening in the upload?
14:08:43 <efried> The schema doesn't provide anything but the capacity as far as the LU itself is concerned.
14:09:03 <efried> And remember, the whole point is that the upload is happening from a different nvl that we can't talk to (except through the SSP).
14:09:29 <efried> So... we could theoretically use the marker LU as a message bus.  This gets pretty complicated.
14:09:50 <efried> Have the owner of the marker LU write heartbeats of some kind, and the other guys read the heartbeats.
14:10:07 <efried> Now we have clock sync problems and everything; but we can get around that.
14:10:13 <thorst> can you see a last touched thing?
14:10:28 <thorst> get a time that the marker was last touched and have the one uploading actually touch the marker
14:10:49 <efried> Not via REST.  Would at the very least have to map & mount it.
14:10:52 <thorst> though, we get into the same lock contention we'd be in otherwise
14:10:55 <thorst> whoa, no mounts
14:11:11 <adreznec> Ew ew ew
14:11:16 <efried> Maybe not mount.
14:11:25 <efried> What metadata does linux provide on a mapped device?
14:11:54 <efried> So yeah, not map, but read.
14:12:16 <efried> Basically have raw, dd-able data written by the marker owner, read by the waiters.
14:12:17 <thorst> should we table that for further discussion?  I want to make sure we get to wangqwsh's item because it is late for him
14:12:25 <thorst> we can swing back to it?
14:12:26 <efried> sure.
14:12:33 <adreznec> Ok
14:12:37 <efried> If I propose a patch for 1 & 2...
14:12:39 <adreznec> esberglu: any other CI stabilizing topics?
14:12:44 <efried> Can it be tested without merging it?
14:12:50 <adreznec> efried: that would probably be a good place to start discussion
14:12:53 <adreznec> and I think we could?
14:12:55 <efried> k
14:13:19 <esberglu> I think thats it
14:13:44 <adreznec> Ok
14:14:00 <adreznec> #topic OpenStack-Ansible CI bring-up
14:14:09 <adreznec> wangqwsh: thorst the floor is yours
14:14:18 <adreznec> Oh, right
14:14:34 <adreznec> #action efried to start proposing discussion patches on marker LU enhancements
14:14:40 <adreznec> As you were
14:14:45 <efried> (#1 is kinda dead in the water, alas)
14:15:48 <thorst> alright.  wangqwsh I think you were seeing odd Configparser import issues due to the use of the local2remote patch in your OSA CI
14:15:59 <thorst> as of last night when we discussed, we didn't really have a plan...
14:16:14 <adreznec> I think we have two options
14:16:29 <thorst> wangqwsh: did you make any progress on it or is that still the latest?
14:16:43 <wangqwsh> no progress...
14:16:43 <adreznec> Either install configparser into the nova-master venv for the compute node
14:16:54 <adreznec> Or make it so the local2remote patch doesn't require configparser
14:17:02 <thorst> let me look at that patch...
14:17:28 <thorst> ewww...the patch has a tab in it!
14:17:44 <wangqwsh> repo container builds the wheels.
14:18:13 <adreznec> wangqwsh: that doesn't really matter, we could patch it in post-build
14:18:28 <thorst> adreznec efried: I feel like ConfigParser could easily be removed...for something more trivial.  Though its probably a few hours of work.
14:19:03 <thorst> are we using the 'setup.ini'?
14:19:05 <adreznec> Hmm ok
14:19:09 <adreznec> Let me look at the patch
14:19:11 <thorst> sorry...pypowervm.ini
14:20:13 <efried_otm> I could rewrite the confit parsing if I had to. not trivial.
14:20:30 <thorst> yeah, but are we even using it...
14:20:36 <thorst> it looks like it could fall back to nothing
14:20:56 <efried_otm> other than in the patch?  I don't think so.
14:21:32 <efried_otm> ugh, and do the discovery every time, which is slow.
14:21:41 <thorst> efried_otm: yeah, but
14:21:47 <thorst> discovery when you start the adapter.
14:21:47 <efried_otm> but yeah, that's the east path.
14:21:51 <thorst> which is once.
14:21:59 <thorst> maybe twice.
14:22:04 <efried_otm> easy*
14:23:30 <adreznec> Hmm
14:23:37 <thorst> flip side...
14:23:45 <thorst> how hard is it to add that dependency to the container?
14:24:00 <thorst> adreznec / esberglu?
14:24:10 <adreznec> Lets see
14:24:20 <adreznec> So the actual action of adding it to the container is really easy, right?
14:24:48 <adreznec> The path is consistent, so we'd source /path/to/nova/venv/bin/activate
14:25:05 <adreznec> and pip install configparser == v.whatever there
14:25:11 <adreznec> timing might be more complicated
14:26:10 <adreznec> wangqwsh: at what point in the run were you seeing the failure?
14:26:49 <adreznec> Would it be enough to let the OSA AIO finish running, then before we kick tempest patch configparser into the venv, restart nova-compute, validate it comes up, then do the tempest run?
14:26:50 <wangqwsh> start the nova-compute service, it printed
14:27:16 <thorst> but will OSA be OK if nova-compute just dies
14:27:23 <adreznec> I think so
14:27:28 <adreznec> Easy to test locally
14:27:28 <wangqwsh> yes
14:27:42 <adreznec> I'll just break my driver settings, kick off an AIO and find out :)
14:27:55 <adreznec> I think it will though
14:28:10 <adreznec> I don't think it checks service state for long enough to notice the failure
14:28:32 <adreznec> Ok, so 2 minutes left here
14:29:22 <adreznec> wangqwsh: do you want to try patching configparser into the venv and seeing if nova-compute works?
14:29:53 <adreznec> Should just need to run "source /openstack/venvs/nova-master/bin/activate" and "pip install configparser"
14:29:58 <wangqwsh> how to install the pkg? via pip?
14:29:59 <adreznec> The restart nova-compute
14:30:16 <adreznec> I'll test the driver breakage situation on my AIO
14:30:20 <adreznec> To see if that timing would work
14:30:26 <wangqwsh> the pip config was changed to repo containter.
14:30:37 <adreznec> Ah, and configparser isn't there?
14:30:50 <wangqwsh> pip install would not find the pkg.
14:30:50 <wangqwsh> yes
14:30:55 <adreznec> Hmm ok
14:31:06 <adreznec> That's inconvenient
14:31:38 <thorst> I'm wondering if maybe we try to remove the dependency in pypowervm...  Maybe wangqwsh could drive that and efried could review?
14:31:47 <thorst> just seems...simpler...
14:31:48 <adreznec> I wonder if we could just patch in configparser as an additional dependency for nova-powervm in CI runs
14:32:10 <adreznec> and then it would just end up in the venv
14:32:54 <wangqwsh> repo container builds the wheels using openstack requirement files.
14:33:30 <thorst> wangqwsh: want to try those two approaches?  1) Change the nova-powervm requirements to include Configparser (I find that eww) and 2) work on the local2remote patch with efried to see how to remove that dependency
14:34:10 <wangqwsh> ok
14:34:10 <adreznec> wangqwsh: right, we could basically add configparser to the list of requirements needed for nova-powervm, but only for CI runs
14:34:17 <adreznec> Sure
14:34:23 <adreznec> I'll take #1
14:34:38 <adreznec> #action adreznec to test patching nova-powervm requirements to include configparser in OSA CI runs
14:34:58 <adreznec> #action wangqwsh and efried to evaluate removing configparser dependency from local2remote patch
14:35:17 <adreznec> #topic Future meetings
14:35:25 <adreznec> So I think this has been pretty productive
14:35:27 <thorst> I liked this.  We should do it again
14:35:34 <adreznec> What do you guys think about doing this weekly
14:35:39 <adreznec> I can get something scheduled
14:35:41 <thorst> +2
14:35:51 <adreznec> efried: esberglu wangqwsh ^^ ??
14:35:57 <thorst> we should get a wiki out there too, like the nova meetings.
14:36:01 <adreznec> Right
14:36:04 <adreznec> that was my plan
14:36:10 <adreznec> Formalize this as a driver meeting
14:36:19 <esberglu> Yeah I think this was better than phone calls
14:36:33 <esberglu> Plus there is now a chat history
14:36:34 <adreznec> Cool
14:36:35 <wangqwsh> if the #1 not work, i can try #2
14:36:35 <wangqwsh> sure
14:36:35 <wangqwsh> 1 question:
14:37:05 <adreznec> Does this time slot work for people?
14:37:14 <thorst> works for me
14:37:24 <thorst> unless one of us has to SDB present...
14:37:32 <adreznec> Right
14:37:36 <wangqwsh> hscipaddess issue
14:37:39 <adreznec> Which would be an issue in 2 weeks
14:37:48 <adreznec> Ok, I'll look at calendars
14:39:19 <wangqwsh> thorst: do you mean the hscipaddress works for you?
14:39:49 <thorst> wangqwsh: I think that will go away with the ConfigParser dependency
14:39:59 <thorst> I only saw that error once, so I think it was an anomoly
14:40:41 <adreznec> Yeah
14:40:45 <adreznec> I think that was a timing issue
14:40:54 <wangqwsh> ok,
14:41:02 <wangqwsh> i will try it again
14:41:16 <adreznec> #action adreznec to schedule weekly driver team meeting
14:41:28 <adreznec> All right, I think we're done here?
14:41:32 <adreznec> And we're over time
14:41:37 <adreznec> Thanks everyone!
14:41:39 <thorst> damn...almost made it
14:41:41 <thorst> thx!
14:41:43 <adreznec> #endmeeting