13:00:41 #startmeeting powervm_driver_meeting 13:00:42 Meeting started Tue Apr 4 13:00:41 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:46 The meeting name has been set to 'powervm_driver_meeting' 13:00:48 o/ 13:03:11 #topic Out Of Tree Driver 13:03:25 ocata is broken 13:03:34 esberglu: the upload thing? 13:03:43 Yep 13:03:54 crap...how did that get back ported... 13:04:06 I thought we tested it earlier :-/ 13:04:10 in the staging env... 13:05:15 So CI is still down, I can redeploy again with newton or we can get this figured out and be without CI until then 13:05:44 esberglu: well it seems like we're going to need a new pypowervm for it 13:05:44 I think I may have missed some of the convo yesterday 13:05:52 and we're going to have to bump that back in ocata...which is awful 13:05:54 But it looked like efried was expecting this 13:06:06 well, efried thought this could occur. I didn't think it could 13:06:26 it was based off of whether or not whatever broke us was back ported 13:06:36 efried will be on in 15 min and we can discuss more then 13:07:09 Do you know what broke us? 13:07:19 nope 13:07:25 that's the irony...and what I'm frustrated about 13:07:33 we're trying to fix for something that we don't know why we broke 13:07:56 Well lets look at what got into ocata in the last week 13:08:10 +2 13:08:13 Because I deployed ocata on staging at the end of last week no problem 13:10:04 There's nothing that has gone into ocata since I deployed on staging successfully 13:10:13 in the 3 powervm projects 13:10:18 right. 13:10:21 it'd be in nova itself 13:10:26 Yep I'm looking there now 13:10:43 It would have to be a bugfix at that point that broke us, right? 13:10:52 I mean Ocata's been cut for some time now 13:10:57 I'd assume 13:10:58 https://github.com/openstack/nova/commits/stable/ocata 13:11:02 nothing much in the past week there tho 13:11:09 Right... 13:11:24 thought maybe a global req change 13:11:26 None of that looks suspect 13:11:28 but nothing much there 13:11:31 pbr updated... 13:11:47 thorst: yeah, once ocata gets cut reqs are pretty much frozen 13:11:52 Unless there's a major breaking issue 13:12:32 esberglu: I'm now curious if newton is hosed. 13:13:03 I hope not. That means CI is down for the count until this is resolved 13:13:13 right. 13:13:56 adreznec: does concurrent.futures use greenlet or eventlet? 13:14:09 do you know? 13:15:26 thorst: not sure offhand 13:16:22 wait 13:16:28 isn't concurrent.futures a stdlib 13:17:31 adreznec: yeah, but that's where efried sees us hanging 13:17:32 neither eventlet or greenlet are builtins, so it can't use either by default 13:17:48 so efried wants to switch to all eventlet I think... 13:18:16 Howdy. 13:18:29 so the net is...I think that is now our highest priority 13:18:31 Not sure if I missed anything, but I have a plan for the broken upload in ocata. 13:18:33 and we should work it first oot 13:18:52 rather than it (so we don't create a bunch of misc reviews for the core side) 13:18:57 efried: net is, CI is down 13:19:17 what I think we're curious about is did something change in OpenStack...or was it perhaps even lower than that 13:19:23 like eventlet or somewhere else 13:19:29 thorst: is this futures in py2.7 or py3? 13:19:53 Staging CI is still up. So if we need to run something through we can still do it there 13:19:57 For OOT ocata, we need a bug opened; then we a) port to ocata OOT the change that moves from FUNC back to IO_STREAM; b) update the pypowervm requirement to 1.1.1. And, of course, we'll need to release 1.1.1. 13:19:59 I'll admit I'm not totally in the loop on the upload issue 13:20:30 efried: what is the 'fix' 13:20:34 down in pypowervm 13:20:37 Yeah, so something changed in eventlet recently - I still haven't nailed down exactly what, but sdague gave me some vague pointers last week. 13:20:51 well shit 13:20:56 efried: something that would have changed since ocata was released? 13:20:57 that'll affect things way back 13:21:10 The fix is two-sided, unfortunately. In pypowervm, we have to kill coordinated upload. In community, we have to kill FUNC. 13:21:12 otherwise shouldn't it be pinned by version in reqs? 13:21:27 Because there's no way to do FUNC without threads, and there's no way to do coordinated without threads. 13:21:44 The alternative is to retool pypowervm to use eventlet instead of futures. 13:21:50 efried: so is concurrent.futures just dead? 13:22:00 because of a change in eventlet? 13:22:01 No, it's just incompatible with greenlet. 13:22:18 Although that might not be entirely true. 13:22:20 so func is still viable, just not in an OpenStack env. 13:22:40 and it also calls into question if we need a change for your VIOS Task thingy 13:22:46 which also uses concurrent.futures 13:22:55 I don't disagree with that. 13:23:12 non-disagreement is as close as we can hope for a resounding agreement 13:23:21 efried: so are you focused on that change today? 13:23:27 and we just keep CI down while we fix that? 13:23:47 Which change? Get rid of FUNC and coordinated in ocata OOT? 13:23:58 Or convert to greenlet? 13:24:23 Perhaps we should take a couple of minutes and go over what I (sort of, maybe) know so far about the underlying cause. 13:24:35 efried: yes, lets do that 13:24:41 So from my research with mdrabe yesterday, I *think* it goes like this: 13:24:53 There's two kinds of multiprocessing models available: threads and greenlets. 13:24:55 So just to clarify 13:25:04 What version of pypowervm is this using 13:25:07 With Ocata 13:25:15 1.1.0? 13:25:17 I don't fully understand the difference between them, but they're totally different animals, not just different implementations on top of the same underlying threading model. 13:25:49 Openstack uses greenlets throughout. They even have a hacking check in place to make sure you're using eventlet through their nova.utils wrapper of it. 13:26:20 (adreznec, not sure, but to fix this we'll need to release 1.1.1 and bump the ocata req to that. Is that even legal?) 13:26:43 Uh 13:26:44 ocata is using 1.0.0.4 I believe 13:26:51 We can technically do that for Ocata 13:26:53 I guess... 13:27:07 in the future... please, let's never have to deal with that 13:27:08 Yeah, the req bump will need to happen regardless of which way we fix this. Unless we can figure out some as-yet-unknown way to fix it purely in the community code. 13:27:36 I'm just wondering if this is only broken with some combination of versions 13:27:53 e.g. only with pypowervm 1.1.0 because that's where we require futures>=3.0 13:28:08 vs just "futures" 13:28:12 with no version req 13:28:32 Mm. And presumably openstack doesn't require futures? 13:28:39 nope 13:28:44 sorry, yes, they do 13:28:54 that's where the >3.0 req came from 13:29:06 (walking to a meeting) 13:29:55 Anyway, threading in python apparently has this GIL (global interpreter lock) which actually makes it so that literally only one thread runs at a time - the others are stopped. 13:30:05 right. 13:30:28 Normally this is okay because threads can yield and allow other threads to run, so as long as your actual programming doesn't have deadlocks in it, you're aaight. 13:30:29 But 13:30:41 This sucker is blocking on a syscall. 13:30:47 Which doesn't yield. 13:31:36 So all the other threads - including the greenlets, including the one that would kick the REST server to do its open, which would unblock the write side - are frozen. 13:31:58 Now, there is apparently a way to explicitly release the GIL. 13:32:30 ? 13:33:03 That might be the least disruptive path, if we can figure out how to do it. But a) it's going to be a hack (more on that in a bit), and b) it might not work in the context we would need to do it in - that is, it might only work if we can do it right at that open() call, which is in code we don't own. 13:33:36 efried: to me...lets just fix it proper... 13:33:43 greenlet (not eventlet)? 13:35:33 Yeah, so I don't know what the difference is there - those are different libs - but they use greenlets / green threads under the covers. 13:35:57 Whereas anything that says "thread" - like the native thread library, or concurrent.futures - uses the other kind of threads. 13:36:37 I'd assume we can use greenlets for most things...but the pipe may not be able to use greenlet... 13:37:06 That's an unknown at this point. But I imagine it's gotta be possible. 13:37:17 ok... 13:37:42 so the net is, due to this, we need to bump pypowervm...get a bug for nova-powervm...and possibly back port this way (enough) back 13:38:00 However, given that we're going to need to release a new pypowervm and bump the OOT req to it anyway, I would just as soon do the fix that avoids threading altogether. 13:38:09 waler is testing on Mitaka, so he could actually probably tell us if Mitaka is impacted :-) 13:38:45 efried: for the upload? Sure. VIOS Feed Task stuff...not so sure 13:39:36 thorst Agree, but I think what saves us there is that what's running in that thread is non-blocking. 13:39:36 alright...so I guess that's priority 1... 13:39:53 yep...I agree that is probably not highly impacted. 13:39:55 So maybe it hitches the process while it's doing that POST, but as soon as the POST comes back, we keep truckin. 13:40:09 yeah, kinda ick, not uber ick 13:40:12 Not ideal, and perhaps something we should look into for the future, but not first priority, right. 13:40:37 alright...so gameplan build out here... 13:40:50 1) esberglu - would you be willing to make the bug and tag at least back to ocata 13:41:05 Sure 13:41:21 I honestly suspect that newton / mitaka may still be impacted...would love to know that if you have time to redeploy with newton 13:41:47 2) efried you're updating the pypowervm bits? 13:42:03 3) should I do the nova-powervm bits to swap off func? 13:42:04 thorst: I should be able to do that in the background today 13:43:11 thorst Actually, let me do it. 13:43:19 Take a look at this delta: https://review.openstack.org/#/c/443189/15..16/nova/virt/powervm/disk/ssp.py 13:43:32 yeah 13:43:38 It'll be like that, except we won't actually need the IterableToFileAdapter. 13:43:41 we would need to basically revert into that change for both localdisk 13:43:44 and ssp 13:43:50 why not? 13:43:53 Because I'm gonna make a change to pypowervm ;-) 13:43:59 Since we're going to need a new version of that anyway. 13:44:02 Backward compatible. 13:44:06 don't make it even more complicated :-) 13:44:07 But eliminating the need for IterableToFileAdapter. 13:44:17 It makes it less complicated, really. 13:45:00 The HTTP request expects an iterable. Glance gives us an iterable. For some reason we had pypowervm expecting a file and converting it to an iterable, so the community had to convert the iterable to a file just so pypowervm could convert it back. 13:45:11 which is stupid. 13:45:26 hmm...ok 13:45:45 well, I guess I'll let you do magic and be on point to be a reviewer? 13:46:24 how do we make actions in the meeting? 13:46:41 (for the meeting minutes) 13:46:53 #action esberglu: Open bug for upload issue 13:48:12 I'm going to use 5083, which is already most of the way there. Just need to add the iterable killer. 13:48:41 #action efried drive pypowervm and nova-powervm fixes for upload issue 13:48:54 #action esberglu determine if newton is impacted 13:50:01 efried: 5083 - we should tag that with the bug that esberglu is making 13:50:08 so we have one bug capturing this whole nightmare... 13:50:27 #action adreznec ship out a new pypowervm once this whole fiasco is solved :-) 13:50:38 (heh) 13:50:44 lol 13:50:59 might need to sync that with julio 13:53:01 what else do we have for the meeting? 13:53:10 Cool sounds like we have a plan. Meeting is almost up, anyone have anything else? I don't have anything for CI 13:53:17 efried: anything in-tree? 13:53:32 Just waiting for reviewers at this point correct? 13:53:54 nbante and jay are still doing testing... I know jay is hitting issues, I'm trying to help out once I get in the env. nbante I thik is stuck on something with tempest in OSA. 13:53:56 I think by the time we hit the SSP change set, we'll need to bump the in-tree reqs to the new pypowervm. 13:55:09 correct..adreznec: check once if we have to uncomment anything in user_config.yml to get that work 13:56:20 nbante: you'll likely have to experiment on your own there today. I'm pretty much swamped in meetings until later this afternoon 13:57:21 sure..I already tried most of parts. But will give shot in 2-3 hours. If not work, will send you note 13:58:54 nbante: I may have time to take a look today. I will let you know 13:59:16 nbante: one thought I had... 13:59:20 sure..thanks 13:59:25 I don't think we care if tempest is deployed via OSA 13:59:30 or run from a separate server... 13:59:39 you have a cloud, we just need to run tempest against it 13:59:41 :-) 13:59:46 so that gives you options to try 14:00:10 but esberglu is more familiar than I am...so maybe he'll figure it out in 2 mins 14:00:25 I never tried tempest so not sure how it worked. In SVT, we have own framework 14:00:44 thorst: I haven't got tempest working yet either.... so I'm pretty much in the same boat as nbante right now for OSA CI 14:00:58 esberglu: right...but my thought is 14:01:03 we have tempest working for IT CI 14:01:06 or OOT CI 14:01:13 so...uh...how'd we set it up there? 14:01:21 and can we do the same here? 14:02:56 At least some tweaks will be needed. We can discuss more when I really dive into it 14:03:01 awesome 14:03:10 nbante: are you deployed with Cinder? 14:03:16 no 14:03:24 using local disk only 14:03:38 so we're just getting back to where we were? 14:04:55 after local disk, I worked on tempest where I stuck 14:05:16 do you want me to parallely work on configuring cinder? 14:05:16 ok 14:05:25 ahh, right...so we were getting tempest and then moving to iSCSI cinder 14:05:33 correct 14:05:33 sorry, I'm getting my wires wrong :-) 14:06:19 :) 14:06:40 OK - I'll also catch up with Jay... 14:06:51 can you check with him to get his IRC working? 14:07:21 sure..will check 14:07:50 ok - I didn't have anything else. 14:11:13 #endmeeting