*** apearson has joined #openstack-powervm | 00:38 | |
*** apearson has quit IRC | 00:49 | |
*** svenkat has joined #openstack-powervm | 01:21 | |
*** edmondsw has joined #openstack-powervm | 01:36 | |
*** thorst has joined #openstack-powervm | 01:37 | |
*** edmondsw has quit IRC | 01:40 | |
*** thorst has quit IRC | 01:41 | |
*** svenkat has quit IRC | 02:07 | |
*** svenkat has joined #openstack-powervm | 02:08 | |
*** svenkat has quit IRC | 02:13 | |
*** deep-book-gk has joined #openstack-powervm | 02:34 | |
*** deep-book-gk has left #openstack-powervm | 02:36 | |
*** thorst has joined #openstack-powervm | 02:42 | |
*** thorst has quit IRC | 02:47 | |
*** svenkat has joined #openstack-powervm | 02:49 | |
*** svenkat has quit IRC | 02:53 | |
*** edmondsw has joined #openstack-powervm | 03:24 | |
*** edmondsw has quit IRC | 03:28 | |
*** thorst has joined #openstack-powervm | 03:43 | |
*** thorst has quit IRC | 03:48 | |
*** svenkat has joined #openstack-powervm | 04:21 | |
*** svenkat has quit IRC | 04:25 | |
*** svenkat has joined #openstack-powervm | 05:01 | |
*** svenkat has quit IRC | 05:05 | |
*** thorst has joined #openstack-powervm | 05:44 | |
*** thorst has quit IRC | 05:49 | |
*** svenkat has joined #openstack-powervm | 06:23 | |
*** svenkat has quit IRC | 06:27 | |
*** edmondsw has joined #openstack-powervm | 06:59 | |
*** edmondsw has quit IRC | 07:04 | |
*** thorst has joined #openstack-powervm | 07:45 | |
*** thorst has quit IRC | 07:50 | |
*** edmondsw has joined #openstack-powervm | 08:48 | |
*** edmondsw has quit IRC | 08:53 | |
*** thorst has joined #openstack-powervm | 09:46 | |
*** thorst has quit IRC | 09:58 | |
*** edmondsw has joined #openstack-powervm | 10:36 | |
*** edmondsw has quit IRC | 10:41 | |
*** smatzek has joined #openstack-powervm | 10:49 | |
*** smatzek has quit IRC | 10:55 | |
*** smatzek has joined #openstack-powervm | 10:55 | |
*** thorst has joined #openstack-powervm | 11:13 | |
*** miltonm has joined #openstack-powervm | 11:39 | |
*** svenkat has joined #openstack-powervm | 11:40 | |
*** thorst has quit IRC | 11:43 | |
*** edmondsw has joined #openstack-powervm | 12:06 | |
*** edmondsw has quit IRC | 12:13 | |
*** miltonm has quit IRC | 12:26 | |
*** kylek3h has joined #openstack-powervm | 12:57 | |
*** thorst has joined #openstack-powervm | 13:00 | |
*** dwayne_ has quit IRC | 13:02 | |
*** apearson has joined #openstack-powervm | 13:04 | |
*** esberglu has joined #openstack-powervm | 13:13 | |
*** edmondsw has joined #openstack-powervm | 13:13 | |
*** edmondsw has quit IRC | 13:22 | |
*** edmondsw has joined #openstack-powervm | 13:22 | |
*** apearson has quit IRC | 13:59 | |
*** apearson has joined #openstack-powervm | 14:00 | |
*** apearson has quit IRC | 14:01 | |
*** apearson has joined #openstack-powervm | 14:01 | |
*** apearson has quit IRC | 14:02 | |
*** dwayne has joined #openstack-powervm | 14:10 | |
esberglu | efried_zzz: thorst: edmondsw: Any idea how to resolve this? The CI cloud thinks all 135 ips are used even though there are only 9 instances | 14:46 |
---|---|---|
esberglu | Is there some way to remove these "used" ips that aren't actually being used? | 14:46 |
thorst | esberglu: yeah, though concerning that happened | 14:47 |
thorst | run a 'neutron port-delete' from the CLI on the openstack server | 14:47 |
esberglu | thorst: What I think happened is that the controller was full on disk and so a TON of spawns were failing over the weekend. Some of them must have not been cleaning up | 14:48 |
esberglu | Might have to up the controller disk size | 14:48 |
esberglu | The switch from 15G to 30G images really takes up some space since there are multiple images sitting around | 14:49 |
thorst | :-( | 14:53 |
thorst | we could increase it, but you'd need to rebuild the whole system | 14:53 |
thorst | that's kinda a PITA | 14:53 |
esberglu | thorst: Yeah. Maybe next time I do a full redeploy I will bite the bullet. I doubled it previously thinking that would be enough, wish I would have done more | 14:59 |
thorst | never enough storage | 15:12 |
*** edmondsw has quit IRC | 16:37 | |
*** apearson has joined #openstack-powervm | 16:40 | |
*** efried_zzz is now known as efried | 16:44 | |
*** dwayne has quit IRC | 17:18 | |
esberglu | efried: Those 500 internal server error runs also have this trace in n-cond-cell1.log | 17:54 |
esberglu | http://paste.openstack.org/show/617053/ | 17:54 |
esberglu | You know anything about nova conductor? | 17:54 |
efried | no | 17:54 |
efried | What makes you think that's related? | 17:55 |
efried | Unless it has something to do with data that's normally populated being None, and that funneling down to the REST request, and that hitting some unhandled error path in REST/VIOS. | 17:55 |
esberglu | This is now showing up where we would see the internal error 500 message in powervm_os_ci.html | 17:56 |
esberglu | Looking in n-cpu the 500 error is still found on the failing test | 17:56 |
esberglu | And that 500 error is also printed in n-cond-cell1.log | 17:57 |
esberglu | http://184.172.12.213/55/408955/71/check/nova-in-tree-pvm/e41038c/ | 17:58 |
esberglu | ^ logs I have been looking at | 17:58 |
*** apearson has quit IRC | 17:59 | |
efried | Maybe we're talking about two different things. | 18:02 |
efried | The 500 I care about is the one coming from *our* REST API, on a VIOS POST. | 18:03 |
efried | The odds of that 500 being related in any way to a 500 coming from/through the conductor are virtually nil. | 18:04 |
efried | The problem I have with the 500 from VIOS POST isn't necessarily that it's happening - it's that we have no idea *why* it's happening, because the message is generic. Normally when we get a 500 from our REST API, it has some hint as to what went wrong. | 18:04 |
efried | So that's the first thing we need to nail down and fix. That's the thrust of the defect we opened. | 18:05 |
efried | In order to figure that out, they're gonna need the REST logs. | 18:05 |
efried | Then once we know what the real reason is, we a) get that message fixed, and b) see if it's something we can do anything about, or if it's just a random failure we have to tolerate (like "VIOS busy"). | 18:05 |
esberglu | Look at the first line of the paste above, it is the 500 from the vios no? | 18:06 |
esberglu | The sql_connection could just be a result of the 500 though, not sure | 18:06 |
esberglu | The CantStartEngineError that is | 18:07 |
efried | ah, okay, I see the confusion. So yeah, if there's any correlation at all, the conductor error is the result of the REST 500. Doesn't affect how we follow up. | 18:08 |
efried | esberglu Scanning the code, I *think* the exception you pasted above is actually ignored. Our 500 happens in the try: block at https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L536-L551 -- then the thing that's generating the exception is where it's trying to clean up from that (L561) which generates an exception that gets logged, but ignored (L563) | 18:20 |
efried | In any case, our job is to figure out why we're getting that 500. | 18:21 |
esberglu | efried: I should just be able to scp from the neo onto the AIO vm I think | 18:32 |
*** dwayne has joined #openstack-powervm | 18:37 | |
*** dwayne has quit IRC | 18:45 | |
efried | esberglu You mean scp the REST logs? Yes, if you can figure out which neo it is. Which should be possible in the same way we're doing the local2remote setup. | 18:56 |
efried | esberglu But I thought we had already done that (at least once, manually) and supplied that info in the defect | 18:56 |
efried | Perhaps we just need to put pressure on the defect owner(s) to take a look. | 18:57 |
*** apearson has joined #openstack-powervm | 19:07 | |
*** dwayne has joined #openstack-powervm | 19:24 | |
*** edmondsw has joined #openstack-powervm | 19:30 | |
*** edmondsw has quit IRC | 19:32 | |
*** edmondsw has joined #openstack-powervm | 19:32 | |
efried | esberglu Ah, that defect was owned by me. That's why it wasn't going anywhere. Needs to be owned by REST. Moved it to changh. | 19:33 |
esberglu | efried: Oh I opened it in pypowervm that's why. Thought it would automatically reroute when I changed that | 19:43 |
esberglu | Makes sense that it doesn't I guess | 19:43 |
*** thorst is now known as thorst_afk | 20:25 | |
efried | Doesn't make sense to me. | 20:28 |
*** apearson has quit IRC | 20:45 | |
*** apearson has joined #openstack-powervm | 20:47 | |
*** apearson has quit IRC | 20:58 | |
*** svenkat has quit IRC | 20:59 | |
*** apearson has joined #openstack-powervm | 21:00 | |
edmondsw | esberglu check the note I just sent | 21:09 |
edmondsw | we must be running tox all the time, and we haven't been seeing this, right? | 21:09 |
esberglu | I haven't seen it anywhere, tox gets run every patch | 21:15 |
*** smatzek has quit IRC | 21:16 | |
edmondsw | esberglu do we maybe have wsgi_intercept installed? | 21:16 |
esberglu | edmondsw: Not sure. That's part of the jenkins gating jobs not the PowerVM CI | 21:18 |
esberglu | edmondsw: Not seeing anything about wsgi_intercept in the logs from those runs | 21:24 |
*** apearson has quit IRC | 21:33 | |
*** apearson has joined #openstack-powervm | 21:40 | |
*** esberglu has quit IRC | 21:51 | |
*** esberglu has joined #openstack-powervm | 21:57 | |
*** kylek3h has quit IRC | 21:58 | |
*** apearson has quit IRC | 22:01 | |
*** esberglu has quit IRC | 22:02 | |
*** esberglu has joined #openstack-powervm | 22:09 | |
*** esberglu has quit IRC | 22:14 | |
*** dwayne has quit IRC | 22:18 | |
edmondsw | esberglu fyi, I was able to reproduce that wsgi_intercept problem on a fresh system | 22:27 |
edmondsw | so it's not looking like an env issue | 22:27 |
*** thorst_afk has quit IRC | 22:38 | |
*** edmondsw has quit IRC | 22:52 | |
*** dwayne has joined #openstack-powervm | 23:20 | |
*** smatzek has joined #openstack-powervm | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!