| *** tosky has quit IRC | 00:06 | |
| *** kgiusti has left #openstack-oslo | 00:34 | |
| *** lbragstad has quit IRC | 01:09 | |
| *** rcernin has quit IRC | 01:31 | |
| *** rcernin has joined #openstack-oslo | 01:32 | |
| *** dave-mccowan has joined #openstack-oslo | 02:18 | |
| *** ducnv has joined #openstack-oslo | 02:54 | |
| *** ducnv has quit IRC | 02:56 | |
| *** ducnv has joined #openstack-oslo | 02:56 | |
| *** lbragstad has joined #openstack-oslo | 03:11 | |
| *** dave-mccowan has quit IRC | 04:00 | |
| *** bnemec has quit IRC | 04:26 | |
| *** ebbex has quit IRC | 04:26 | |
| *** cburgess has quit IRC | 04:26 | |
| *** stephenfin has quit IRC | 04:26 | |
| *** django has quit IRC | 04:26 | |
| *** sorrison has quit IRC | 04:26 | |
| *** bnemec has joined #openstack-oslo | 04:53 | |
| *** ebbex has joined #openstack-oslo | 04:53 | |
| *** cburgess has joined #openstack-oslo | 04:53 | |
| *** stephenfin has joined #openstack-oslo | 04:53 | |
| *** django has joined #openstack-oslo | 04:53 | |
| *** sorrison has joined #openstack-oslo | 04:53 | |
| *** rcernin has quit IRC | 04:56 | |
| *** rcernin has joined #openstack-oslo | 05:02 | |
| *** lbragstad has quit IRC | 05:33 | |
| *** jhesketh has quit IRC | 05:53 | |
| *** jhesketh has joined #openstack-oslo | 05:54 | |
| openstackgerrit | Merged openstack/automaton master: add python 3.7 unit test job https://review.openstack.org/610513 | 06:27 |
|---|---|---|
| *** e0ne has joined #openstack-oslo | 06:32 | |
| *** lxkong has quit IRC | 06:52 | |
| *** Luzi has joined #openstack-oslo | 06:54 | |
| *** e0ne has quit IRC | 07:17 | |
| *** lxkong has joined #openstack-oslo | 07:21 | |
| *** e0ne has joined #openstack-oslo | 07:21 | |
| *** pcaruana has joined #openstack-oslo | 07:25 | |
| *** rcernin has quit IRC | 07:27 | |
| *** aojea has joined #openstack-oslo | 07:33 | |
| *** e0ne has quit IRC | 08:04 | |
| *** moguimar has joined #openstack-oslo | 08:22 | |
| *** tosky has joined #openstack-oslo | 08:27 | |
| *** e0ne has joined #openstack-oslo | 10:25 | |
| openstackgerrit | Merged openstack/oslo.vmware master: SDRS recommendation for create VM https://review.openstack.org/640639 | 11:19 |
| openstackgerrit | Merged openstack/oslo.vmware master: Return None if no suitable datastore is found https://review.openstack.org/640640 | 11:20 |
| *** a-pugachev has joined #openstack-oslo | 11:25 | |
| *** a-pugachev_ has joined #openstack-oslo | 12:02 | |
| *** a-pugachev has quit IRC | 12:05 | |
| *** a-pugachev_ is now known as a-pugachev | 12:05 | |
| *** raildo has joined #openstack-oslo | 12:14 | |
| *** a-pugachev has quit IRC | 12:25 | |
| *** a-pugachev has joined #openstack-oslo | 12:28 | |
| openstackgerrit | Stephen Finucane proposed openstack/oslo.db stable/rocky: Resolve SAWarning in Query.soft_delete() https://review.openstack.org/641650 | 12:48 |
| stephenfin | bnemec: When you're about, would you mind looking at https://review.openstack.org/641650 ? | 12:49 |
| *** a-pugachev has quit IRC | 12:57 | |
| *** kgiusti has joined #openstack-oslo | 13:34 | |
| *** lbragstad has joined #openstack-oslo | 13:48 | |
| *** ansmith has joined #openstack-oslo | 13:49 | |
| *** moguimar has quit IRC | 15:00 | |
| *** Luzi has quit IRC | 15:35 | |
| *** a-pugachev has joined #openstack-oslo | 15:45 | |
| openstackgerrit | Merged openstack/oslo.db stable/rocky: Resolve SAWarning in Query.soft_delete() https://review.openstack.org/641650 | 15:45 |
| *** a-pugachev_ has joined #openstack-oslo | 15:56 | |
| *** a-pugachev has quit IRC | 15:58 | |
| *** a-pugachev has joined #openstack-oslo | 16:00 | |
| *** a-pugachev_ has quit IRC | 16:01 | |
| openstackgerrit | Stephen Finucane proposed openstack/oslo.db stable/queens: Resolve SAWarning in Query.soft_delete() https://review.openstack.org/641724 | 16:44 |
| *** e0ne has quit IRC | 17:07 | |
| *** a-pugachev has quit IRC | 17:46 | |
| *** irclogbot_2 has joined #openstack-oslo | 18:02 | |
| *** e0ne has joined #openstack-oslo | 18:19 | |
| *** e0ne has quit IRC | 18:24 | |
| *** e0ne has joined #openstack-oslo | 18:29 | |
| *** e0ne has quit IRC | 18:33 | |
| *** pcaruana has quit IRC | 18:46 | |
| *** pcaruana has joined #openstack-oslo | 19:05 | |
| openstackgerrit | Ben Nemec proposed openstack/oslo.cache stable/rocky: Fix memcache pool client in monkey-patched environments https://review.openstack.org/640500 | 19:30 |
| *** pcaruana has quit IRC | 19:33 | |
| *** mnaser has joined #openstack-oslo | 20:08 | |
| mnaser | dhellmann: mind if i pick at your brain at a change you helped push..... 6 years ago? :) | 20:08 |
| dhellmann | mnaser : uhm, you can try. :-) | 20:09 |
| mnaser | https://review.openstack.org/#/c/63444/3/openstack/common/service.py L195 in service.py | 20:09 |
| mnaser | i'm noticing an issue with nova where a reload is doing really weird behaviour (troubleshooted a tad with the nova team) | 20:09 |
| mnaser | and it seems that im concluding that https://github.com/openstack/oslo.service/blob/471bfe770808a0348fcce1e83753dcc414b361c1/oslo_service/service.py#L395-L396 should really be "if _is_sighup_and_daemon(signo):" | 20:10 |
| mnaser | because what that is doing is that if (not) sighup and we're running as a daemon: break. this means that a reload is resuliting in self.restart() being called | 20:10 |
| dhellmann | is that now how the reload works? | 20:11 |
| mnaser | well, shouldn't the reload call self.reset() and not self.restart() ? | 20:11 |
| mnaser | nova assumes that a reload (sighup) only calls reset() -- not a full service restart | 20:12 |
| mnaser | maybe dansmith can comment better about this, if he has time.. i've just kinda tried to slowly figure this out | 20:12 |
| dansmith | dhellmann: the oslo service code assumes that we should only do a reload type action on sighup if we've self-daemonized, | 20:13 |
| dansmith | which is not really legit for a systemd-managed process, or anything running under a supervisor (like regular init for respawn, monit, etc) | 20:13 |
| dhellmann | I wonder if this logic predates that? | 20:13 |
| dansmith | predates the idea of running a service foreground style under a supervisor? no, | 20:14 |
| dhellmann | bnemec you've looked at the service code more recently than I have, I think | 20:14 |
| dansmith | but it probably does pre-date the uptick in systemd usage | 20:14 |
| dhellmann | yeah, systemd | 20:14 |
| dhellmann | I trust you both if you say it's wrong, but I don't know this lib well enough to actually confirm it if you're not sure | 20:15 |
| mnaser | i mean, i've only started running into this with our gates failing from a bit of odd behaviour showing up | 20:15 |
| mnaser | our = OSA | 20:15 |
| dansmith | dhellmann: I disagree with the assertion made in the commit message of the code that added this, so...I'm pretty sure :D | 20:15 |
| dhellmann | let's get a fix posted then | 20:17 |
| dansmith | like, it's not that the code is broken, it's that the thing the code is (properly) doing is fundamentally naive, I think | 20:17 |
| * bnemec passes the buck to zaneb :-) | 20:17 | |
| mnaser | given this is breaking the openstack-ansible gates, i can volunteer myself to fix it if someone knows what needs to be done (and doesn't have the time to :<) | 20:18 |
| * zaneb hides | 20:18 | |
| dansmith | mnaser: honestly, I'd just revert the is-daemon check part and call it good | 20:18 |
| bnemec | We have a maybe related bug too: https://bugs.launchpad.net/oslo.service/+bug/1794708 | 20:18 |
| openstack | Launchpad bug 1794708 in oslo.service "Service children uselessly restart on signals" [Undecided,New] | 20:18 |
| dansmith | but I'd defer to osloians about notice, deprecation, etc | 20:18 |
| mnaser | also dansmith found a related bug in neutron | 20:19 |
| mnaser | where it just dies on a sighup too | 20:19 |
| dansmith | bnemec: yeah, we have another bug where someone is doing something wrong in nova because they assume this is just how it's supposed to work | 20:19 |
| dansmith | and that | 20:19 |
| mnaser | https://bugs.launchpad.net/neutron/+bug/1780139 | 20:20 |
| openstack | Launchpad bug 1780139 in neutron "Sending SIGHUP to neutron-server process causes it to hang" [Undecided,Triaged] - Assigned to Bernard Cafarelli (bcafarel) | 20:20 |
| mnaser | this one | 20:20 |
| mnaser | oh jeez | 20:21 |
| mnaser | dansmith: https://bugs.launchpad.net/nova/+bug/1715374 | 20:21 |
| openstack | Launchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer) | 20:21 |
| mnaser | would ya look at that | 20:21 |
| dansmith | mnaser: didn't I say "I hope to god this is not really happening because ... bad things" ? :) | 20:21 |
| dansmith | shutting down rpc and all | 20:22 |
| mnaser | well i guess we can reference that bug to fix it but it's been around a while | 20:22 |
| dansmith | mnaser: I have oslo.service pulled and am working on it | 20:22 |
| mnaser | dansmith: is it safe to reclassify those bugs as "invalid" for nova (not sure about neutron, but at least nova) and add that it affects oslo.service instead? | 20:23 |
| bnemec | FWIW, it sounds like TripleO hit this as well and worked around it instead: https://bugs.launchpad.net/tripleo/+bug/1789680 | 20:23 |
| openstack | Launchpad bug 1789680 in tripleo "mistral MessagingTimeout correlates with containerized undercloud uptime" [Critical,Fix released] - Assigned to Dougal Matthews (d0ugal) | 20:23 |
| dansmith | I don't think so | 20:23 |
| dansmith | they're legit they're just not bugs in nova | 20:23 |
| dansmith | bnemec: right, which is what was tried to do in nova.. | 20:23 |
| dansmith | and maybe neutron too | 20:23 |
| mnaser | dansmith: right so just to move the bug as a oslo.service bug rather than a nova bug | 20:24 |
| dansmith | add oslo.service I think | 20:24 |
| dansmith | so they affect both | 20:25 |
| mnaser | ok, updated https://bugs.launchpad.net/oslo.service/+bug/1715374 | 20:25 |
| openstack | Launchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer) | 20:25 |
| dansmith | I assume that since that change was merged with zero tests (!) that this should be easy ... | 20:26 |
| mnaser | guess by the logic of things you don't have to write any either :) | 20:27 |
| dansmith | le woot :) | 20:27 |
| dansmith | mnaser: can we do gate hackage to test this against your stuff before we merge | 20:28 |
| dansmith | ? | 20:28 |
| mnaser | dansmith: i think so. i'm trying to think if it can work or not, because it's not a direct dependencies (i.e. nova or neutron) but a dependencies of it | 20:28 |
| dansmith | mriedem is a ninja with that stuff | 20:29 |
| bnemec | Wow, I still have a draft comment on https://review.openstack.org/#/c/63444 | 20:29 |
| mnaser | i think what could be a way to hack around it is make a dnm change to nova pointing requirements to that wip oslo.service change | 20:30 |
| bnemec | However, it was a commit message nitpick as opposed to anything useful. | 20:30 |
| mnaser | and then we should be able to clone nova with updated requirements and pull it down | 20:30 |
| * mnaser wonders if there's an easy way to update requirements.txt to somehow point at a specific oslo change | 20:30 | |
| openstackgerrit | Dan Smith proposed openstack/oslo.service master: Revert assertion made about SIGHUP behavior for non-daemon services https://review.openstack.org/641793 | 20:32 |
| dansmith | mnaser: ^ | 20:32 |
| dansmith | I tagged the oslo and nova bugs there. I think the others have been worked around so probably not worth mentioning, but if others care, we can | 20:32 |
| mnaser | dansmith: shouldn't it be "if is_sighup(signo): break"? | 20:33 |
| mnaser | because we want to breakout and not hit the self.restart() ? | 20:33 |
| dansmith | isn't it? | 20:33 |
| mnaser | because `_is_sighup_and_daemon` evaluated to true when i ran it in systemd (i actually did a LOG.debug() with a system reload afterwards) | 20:33 |
| mnaser | it's "if not _is_sighup(signo): break" which means break out _only_ if its not sighup unless im really botting out | 20:34 |
| dansmith | oh, I was blindly reverting the and daemon one, but it was "not sighup" before | 20:34 |
| mnaser | the daemon seemed to be working ok, systemd was saying that it was true, but if `_is_sighup_and_daemon` returns true on a reload, "if not true: break" => it won't break and hit self.restart() | 20:35 |
| dansmith | restart() is not what we want though right? | 20:35 |
| mnaser | if we do sighup i assume we _dont_ want a restart | 20:35 |
| *** e0ne has joined #openstack-oslo | 20:35 | |
| zaneb | I feel like if the 'not' was the wrong way around we'd have noticed that no (maskable) signals other than SIGHUP work for killing the process before now... | 20:36 |
| mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L395 | 20:36 |
| dansmith | zaneb: right that's what I was thinking | 20:36 |
| mnaser | from what i understand, that code will break out only if it's NOT sighup and daemon | 20:36 |
| mnaser | so if you get a reload, you will evaluate false, and it'll call self.restart() | 20:36 |
| mnaser | and i assume self.restart() is not what we want, but self.reset() ? | 20:36 |
| dansmith | mnaser: well we did handle_signal which calls the reset I think | 20:37 |
| dansmith | we want to not run restart though, AFAIK | 20:37 |
| mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L778-L784 | 20:37 |
| mnaser | we call .stop() | 20:37 |
| mnaser | then .reset() / | 20:37 |
| mnaser | so restart() is not an actual restart of process, but reset | 20:38 |
| mnaser | so we do need to call it, that code is right, but it seems like we are calling .stop() then .reset() | 20:38 |
| dansmith | stop where? from _reload_service? | 20:38 |
| dansmith | I'm actually not sure why we're not exiting in that now | 20:38 |
| mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L397 | 20:39 |
| *** aojea has quit IRC | 20:39 | |
| mnaser | into | 20:39 |
| mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L314 | 20:39 |
| mnaser | into | 20:39 |
| mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L780 | 20:39 |
| mnaser | which does | 20:39 |
| mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L760-L762 | 20:39 |
| dansmith | no, I'm saying: | 20:40 |
| dansmith | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L360 | 20:40 |
| dansmith | calls https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L342 | 20:40 |
| dansmith | which raises SignalExit | 20:40 |
| dansmith | oh wait I see | 20:40 |
| dansmith | it doesn't exit unless SYSTEMexit | 20:41 |
| mnaser | yeah but the stop still happens in self.restart() which _really_ is self.reset() | 20:41 |
| mnaser | and self.reset() seems to be a self.stop() followed by self.reset() | 20:42 |
| dansmith | not wure what you mean by that, but, | 20:42 |
| dansmith | if we break on that sighup thing we'll exit the while loop and drop out right? | 20:42 |
| mnaser | dansmith: yes | 20:43 |
| dansmith | which will bubble up to ProcessLauncher's while loop, | 20:43 |
| mnaser | but we don't actually want to break out, because without calling self.restart(), we won't reset | 20:43 |
| dansmith | which will also break and then os._exit right? | 20:43 |
| dansmith | I guess I'm confused about what you're suggesting needs to change, because just flipping the logic on "is not sighup" doesn't seem right | 20:44 |
| mnaser | dansmith: yes, i was wrong about that, i assumed self.restart() was a full service restart | 20:45 |
| mnaser | but in reality, self.restart() is a reset | 20:45 |
| mnaser | so all that logic is okay, but i think your assumption that Service.stop() is _not_ being called is not true, because it is being called on resets | 20:45 |
| mnaser | because of https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L778-L784 | 20:46 |
| dansmith | but restart() does a stop() | 20:46 |
| dansmith | mnaser: stop() can't be called while we're running or we tear down rpc | 20:46 |
| mnaser | and we call self.manager.cleanup_host() in stop() in nova's service | 20:46 |
| dansmith | right | 20:46 |
| mnaser | which sets self._events = None and then causes the breakage | 20:47 |
| mnaser | so either self.reset() needs to rebuild everything that self.stop() tore down, or we stop calling self.stop() in oslo service and let services handle their own resets | 20:47 |
| dansmith | we have to make a change here that ends up with us calling Service.reset() without Service.stop() first, agreed? | 20:47 |
| mnaser | correct | 20:47 |
| dansmith | mnaser: it can't do the former because that means we kill in-flight rpc calls | 20:48 |
| mnaser | right, it seems that right now, oslo's default behaviour is to call stop() on the service, then reset(), then start() | 20:48 |
| mnaser | if you ask me, sounds like a normal restart.. | 20:48 |
| mnaser | oslo should just call reset() and do nothing else | 20:49 |
| dansmith | right.. reset should be pointless, | 20:49 |
| dansmith | although it's probably doing a reset because it doesn't re-create the Service() object because of the sequencing | 20:49 |
| dansmith | right | 20:49 |
| dansmith | mnaser: I think there might be multple paths in the same file here, | 20:50 |
| dansmith | because the ProcessLauncher version looks closer | 20:50 |
| dansmith | well, no it kills all children too | 20:50 |
| mnaser | yep | 20:50 |
| dansmith | but ProcessLauncher.wait() is replicating things from ServiceLauncher() | 20:50 |
| mnaser | i'm struggling to understand the why behind that | 20:51 |
| dansmith | mnaser: so I haven't eaten anything all day yet and am pretty foggy and need to go do that.. you wanna take that patch and hack on it? | 20:51 |
| dansmith | yeah, that seems weird to me too | 20:51 |
| dansmith | I'm guessing it's an mpm sort of "processes or threads" sort of thing | 20:51 |
| mnaser | https://github.com/openstack/neutron/blob/master/neutron/service.py#L137-L146 | 20:51 |
| mnaser | https://github.com/openstack/neutron/blob/9a15084375a80df830cf50c1fb20908b5d7f822c/neutron/common/config.py#L106-L112 | 20:52 |
| mnaser | im gonna guess neutron makes that same assumption | 20:52 |
| mnaser | and thats' why it dies, it has no rpc after a reload | 20:52 |
| dansmith | yeah | 20:52 |
| mnaser | ok, this will be interesting. a fix in oslo is probably the fundamental way but i'll try to study the history behind the cahnges | 20:52 |
| mnaser | and see why we've decided to .stop() .reset() .start() | 20:52 |
| dansmith | I think it has to be a fix in oslo, because we can't really prevent the stop (cleanly) from in nova or neutron | 20:53 |
| mnaser | yeah, it's not like we'd be changing default behavior either, i'll dig | 20:53 |
| dansmith | ack, thanks | 20:54 |
| openstackgerrit | Ben Nemec proposed openstack/oslo.log master: Test bionic for legacy job https://review.openstack.org/641799 | 20:55 |
| *** e0ne has quit IRC | 21:14 | |
| *** mtreinish has joined #openstack-oslo | 21:30 | |
| *** kgiusti has left #openstack-oslo | 21:37 | |
| *** ansmith has quit IRC | 21:46 | |
| *** e0ne has joined #openstack-oslo | 22:01 | |
| *** raildo has quit IRC | 22:13 | |
| *** a-pugachev has joined #openstack-oslo | 22:21 | |
| *** a-pugachev has quit IRC | 22:22 | |
| *** tosky has quit IRC | 22:22 | |
| *** a-pugachev has joined #openstack-oslo | 22:22 | |
| *** ansmith has joined #openstack-oslo | 22:28 | |
| *** tosky has joined #openstack-oslo | 22:29 | |
| *** e0ne has quit IRC | 22:34 | |
| *** e0ne has joined #openstack-oslo | 22:34 | |
| *** e0ne has quit IRC | 22:36 | |
| *** rcernin has joined #openstack-oslo | 22:41 | |
| openstackgerrit | Ben Nemec proposed openstack/oslo.log master: Test bionic for legacy job https://review.openstack.org/641799 | 22:42 |
| *** purplerbot has quit IRC | 23:26 | |
| *** purplerbot has joined #openstack-oslo | 23:26 | |
| *** tosky has quit IRC | 23:34 | |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!